SlideShare a Scribd company logo

Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration

Seokhwan Kim
Seokhwan Kim
Seokhwan KimScientist at Institute for Infocomm Research

Poster presented at ICASSP 2019

Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration

1 of 1
Download to read offline
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions
for Punctuation Restoration
Seokhwan Kim
Adobe Research, San Jose, CA, USA
Punctuation Restoration
Aims to generate the punctuation marks from the
unpunctuated ASR outputs
A sequence labelling task
to predict the punctuation label yt for the t-th timestep
in a given word sequence X = {x1 · · · xt · · · xT }
where C ∈ {‘, COMMA , ‘.PERIOD , ‘?QMARK }
yt =



c ∈ C if a punctuation symbol c is located
between xt−1 and xt,
O otherwise,
Examples
Input: so if we make no changes today what does tomorrow
look like well i think you probably already have the picture
Output: so , if we make no changes today , what does
tomorrow look like ? well , i think you probably already have
the picture .
t xt yt
1 so O
2 if ,COMMA
3 we O
4 make O
5 no O
6 changes O
7 today O
8 what ,COMMA
9 does O
10 tomorrow O
11 look O
t xt yt
12 like O
13 well ?QMARK
14 i ,COMMA
15 think O
16 you O
17 probably O
18 already O
19 have O
20 the O
21 picture O
22 </s> .PERIOD
Related Work
Multiple fully-connected hidden layers [1]
Convolutional neural networks (CNNs) [2]
Recurrent neural networks (RNNs) [4]
RNNs with neural attention mechanisms [3]
Main Contributions
Context encoding from the stacked recurrent layers to
learn more hierarchical aspects
Neural attentions applied on every intermediate
hidden layer to capture the layer-wise features
Multi-head attentions instead of a single attention
function to diversify the layer-wise attentions further
Methods
Model Architecture
yt
st-1st-1
stst
∙∙∙∙∙∙
n-bidirectional recurrent layers
AttentionAttention AttentionAttention AttentionAttention
m-heads mm mm
Attention Attention Attention
m-heads m m
x1 ∙∙∙h1
1h1
1 h2
1h2
1 hn
1hn
1x1 ∙∙∙h1
1 h2
1 hn
1
xT ∙∙∙h1
Th1
T h2
Th2
T hn
Thn
TxT ∙∙∙h1
T h2
T hn
T
h2
th2
txt ∙∙∙h1
th1
t hn
thn
th2
txt ∙∙∙h1
t hn
t
∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙∙∙∙
∙∙∙
∙∙∙
∙∙∙
∙∙∙
Word Embedding
Each word xt in a given input sequence X is represented as a
d-dimensional distributed vector xt ∈ Rd
Word embeddings can be learned from scratch with random
initialization or fine-tuned from pre-trained word vectors
Bi-directional Recurrent Layers
Stacking multiple recurrent layers on top of each other
−→
h i
t =



GRU xt,
−→
h 1
t−1 if i = 1,
GRU hi−1
t ,
−→
h i
t−1 if i = 2, · · · n,
The backward state
←−
h i
t is computed also with the same way
but in the reverse order from the end of the sequence T to t.
Both directional states are concatenated into the output state
hi
t =
−→
h i
t,
←−
h i
t ∈ R2d
to represent both contexts together
Uni-directional Recurrent Layer
The top-layer outputs [hn
1, · · · , hn
T ] are forwarded to a
uni-directional recurrent layer
st = GRU (hn
t , st−1) ∈ Rd
,
Multi-head Attentions
The hidden state st constitutes a query to neural attentions
based on scaled dot-product attention
Attn (Q, K, V) = softmax
QKT
√
d
V,
Multi-head attentions are applied for every layer separately to
compute the weighted contexts from different subspaces
fi,j
= Attn S · Wi,j
Q , Hi
· Wi,j
K , Hi
· Wi,j
V ,
where S = [s1; s2; · · · ; sT ] and Hi
= hi
1; hi
2; · · · ; hi
T
Softmax Classification
yt = softmax st, f1,1
t , · · · , fn,m
t Wy + by
Experimental Setup
IWSLT dataset
English reference transcripts of TED Talks
2.1M, 296k, and 13k words for training,
development, and test on reference, respectively
Model Configurations
Number of recurrent layers: n ∈ {1, 2, 3, 4}
Number of attention heads: m ∈ {1, 2, 3, 4, 5}
Attended contexts: layer-wise or only on top
Implementation Details
Word embeddings: 50d GloVe on Wikipedia
Optimizer: Adam optimizer
Loss: Negative log likelihood loss
Hidden dimension d = 256
The same dimension across all the components
Except the word embedding layer
Mini-batch size: 128
Dropout rate: 0.5
Early stopping within 100 epochs
Results
Comparisons of the performances with
different network configurations
Single-head
Top-layer
Single-head
Layer-wise
Multi-head
Top-layer
Multi-head
Layer-wise
0.62
0.63
0.64
0.65
0.66
0.67
0.68
F-measure
1 layer 2 layers 3 layers 4 layers
Results
COMMA PERIOD QUESTION OVERALL
Models P R F P R F P R F P R F SER
DNN-A [1] 48.6 42.4 45.3 59.7 68.3 63.7 - - - 54.8 53.6 54.2 66.9
CNN-2A [1] 48.1 44.5 46.2 57.6 69.0 62.8 - - - 53.4 55.0 54.2 68.0
T-LSTM [2] 49.6 41.4 45.1 60.2 53.4 56.6 57.1 43.5 49.4 55.0 47.2 50.8 74.0
T-BRNN [3] 64.4 45.2 53.1 72.3 71.5 71.9 67.5 58.7 62.8 68.9 58.1 63.1 51.3
T-BRNN-pre [3] 65.5 47.1 54.8 73.3 72.5 72.9 70.7 63.0 66.7 70.0 59.7 64.4 49.7
Single-BiRNN [4] 62.2 47.7 54.0 74.6 72.1 73.4 67.5 52.9 59.3 69.2 59.8 64.2 51.1
Corr-BiRNN [4] 60.9 52.4 56.4 75.3 70.8 73.0 70.7 56.9 63.0 68.6 61.6 64.9 50.8
DRNN-LWMA 63.4 55.7 59.3 76.0 73.5 74.7 75.0 71.7 73.3 70.0 64.6 67.2 47.3
DRNN-LWMA-pre 62.9 60.8 61.9 77.3 73.7 75.5 69.6 69.6 69.6 69.9 67.2 68.6 46.1
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
so
if
wemake
nochangestodaywhatdoestomorrow
look
likewell
ithink
youprobablyalreadyhave
thepicture
so
if
we
make
no
changes
today
what
does
tomorrow
look
like
well
i
think
you
probably
already
have
the
picture
References
[1] Xiaoyin Che, ChengWang, Haojin Yang, and Christoph Meinel, ‘Punctuation prediction for unsegmented transcript based on word vector’, LREC 2016.
[2] Ottokar Tilk and Tanel Alumae, ‘LSTM for punctuation restoration in speech transcripts’, INTERSPEECH 2015.
[3] Ottokar Tilk and Tanel Alumae, ‘Bidirectional recurrent neural network with attention mechanism for punctuation restoration’, INTERSPEECH 2016.
[4] Vardaan Pahuja et al., ‘Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks’, INTERSPEECH 2017.
345 Park Avenue, San Jose, CA 95110, USA Email: seokim@adobe.com

Recommended

More Related Content

What's hot

Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Kai Katsumata
 
Single source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstraSingle source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstraRoshan Tailor
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisAmrinder Arora
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treeoneous
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in Rhtstatistics
 
Dijkstra's algorithm
Dijkstra's algorithmDijkstra's algorithm
Dijkstra's algorithmgsp1294
 
Sns mid term-test2-solution
Sns mid term-test2-solutionSns mid term-test2-solution
Sns mid term-test2-solutioncheekeong1231
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Modelshtstatistics
 
gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelshtstatistics
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Shiang-Yun Yang
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in Rhtstatistics
 
Deep dive into rsa
Deep dive into rsaDeep dive into rsa
Deep dive into rsaBill GU
 
Prim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmPrim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmAcad
 
lecture 30
lecture 30lecture 30
lecture 30sajinsc
 
Justesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codesJustesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codesMadhumita Tamhane
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 

What's hot (20)

Biconnectivity
BiconnectivityBiconnectivity
Biconnectivity
 
Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法Linear Cryptanalysis Lecture 線形解読法
Linear Cryptanalysis Lecture 線形解読法
 
Single source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstraSingle source stortest path bellman ford and dijkstra
Single source stortest path bellman ford and dijkstra
 
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisEuclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
Euclid's Algorithm for Greatest Common Divisor - Time Complexity Analysis
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning tree
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
 
Dijkstra's algorithm
Dijkstra's algorithmDijkstra's algorithm
Dijkstra's algorithm
 
Sns mid term-test2-solution
Sns mid term-test2-solutionSns mid term-test2-solution
Sns mid term-test2-solution
 
From L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized ModelsFrom L to N: Nonlinear Predictors in Generalized Models
From L to N: Nonlinear Predictors in Generalized Models
 
Merge Sort
Merge SortMerge Sort
Merge Sort
 
gnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Modelsgnm: a Package for Generalized Nonlinear Models
gnm: a Package for Generalized Nonlinear Models
 
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
 
Deep dive into rsa
Deep dive into rsaDeep dive into rsa
Deep dive into rsa
 
Prim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmPrim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithm
 
Sns pre sem
Sns pre semSns pre sem
Sns pre sem
 
lecture 30
lecture 30lecture 30
lecture 30
 
Justesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codesJustesen codes alternant codes goppa codes
Justesen codes alternant codes goppa codes
 
Lec10
Lec10Lec10
Lec10
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 

Similar to Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration

Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Universitat Politècnica de Catalunya
 
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxjacksnathalie
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Skiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distanceSkiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distancezukun
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive modelsAkira Tanimoto
 
2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptxAnum Zehra
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
machinelearning project
machinelearning projectmachinelearning project
machinelearning projectLianli Liu
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingzukun
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Vladimir Bakhrushin
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
LeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdfLeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdfzupsezekno
 
Chapter 06 rsa cryptosystem
Chapter 06   rsa cryptosystemChapter 06   rsa cryptosystem
Chapter 06 rsa cryptosystemAnkur Choudhary
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationAlexander Litvinenko
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 

Similar to Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration (20)

Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
 
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
2_1 Edit Distance.pptx
2_1 Edit Distance.pptx2_1 Edit Distance.pptx
2_1 Edit Distance.pptx
 
Skiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distanceSkiena algorithm 2007 lecture17 edit distance
Skiena algorithm 2007 lecture17 edit distance
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive models
 
2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
machinelearning project
machinelearning projectmachinelearning project
machinelearning project
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
LeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdfLeetCode Solutions In Java .pdf
LeetCode Solutions In Java .pdf
 
Chapter 06 rsa cryptosystem
Chapter 06   rsa cryptosystemChapter 06   rsa cryptosystem
Chapter 06 rsa cryptosystem
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
 
Lifting 1
Lifting 1Lifting 1
Lifting 1
 

More from Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)Seokhwan Kim
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionSeokhwan Kim
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Seokhwan Kim
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)Seokhwan Kim
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Seokhwan Kim
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...Seokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...Seokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessSeokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...Seokhwan Kim
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템Seokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionSeokhwan Kim
 

More from Seokhwan Kim (20)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot Interaction
 
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling...
 
The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)The Fourth Dialog State Tracking Challenge (DSTC4)
The Fourth Dialog State Tracking Challenge (DSTC4)
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 

Recently uploaded

Self scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsSelf scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsBram Vogelaar
 
owasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWEowasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWEArun Voleti
 
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTSi-engage
 
Microsoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricMicrosoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricJuan Fabian
 
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfunit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfStephenTec
 
Get Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdfGet Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdfAngela Johnson
 
Slide Deck - Milestone 9 alx mils .pptx
Slide Deck  - Milestone 9 alx mils .pptxSlide Deck  - Milestone 9 alx mils .pptx
Slide Deck - Milestone 9 alx mils .pptxYassineBissaoui1
 
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfunit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfStephenTec
 
SATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdfSATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdfnatarajan8993
 
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdfJohn Archer
 
unit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfunit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfStephenTec
 
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfIndia's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfgranitesrijan
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fxjavierdavidvelasco17
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)GDSCNiT
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
unit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfunit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfStephenTec
 
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfunit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfStephenTec
 
BotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdfBotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdfnatarajan8993
 

Recently uploaded (20)

Self scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsSelf scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloads
 
owasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWEowasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWE
 
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
 
Microsoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricMicrosoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ Fabric
 
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfunit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
 
Get Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdfGet Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdf
 
Slide Deck - Milestone 9 alx mils .pptx
Slide Deck  - Milestone 9 alx mils .pptxSlide Deck  - Milestone 9 alx mils .pptx
Slide Deck - Milestone 9 alx mils .pptx
 
Features of IETM Software -Code and Pixels
Features of IETM Software -Code and PixelsFeatures of IETM Software -Code and Pixels
Features of IETM Software -Code and Pixels
 
Importance Of Smaket In Your Buussiness
Importance Of Smaket In Your BuussinessImportance Of Smaket In Your Buussiness
Importance Of Smaket In Your Buussiness
 
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfunit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
 
SATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdfSATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdf
 
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
 
unit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfunit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdf
 
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfIndia's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fx
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
unit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfunit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdf
 
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfunit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
 
BotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdfBotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdf
 

Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration

  • 1. Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration Seokhwan Kim Adobe Research, San Jose, CA, USA Punctuation Restoration Aims to generate the punctuation marks from the unpunctuated ASR outputs A sequence labelling task to predict the punctuation label yt for the t-th timestep in a given word sequence X = {x1 · · · xt · · · xT } where C ∈ {‘, COMMA , ‘.PERIOD , ‘?QMARK } yt =    c ∈ C if a punctuation symbol c is located between xt−1 and xt, O otherwise, Examples Input: so if we make no changes today what does tomorrow look like well i think you probably already have the picture Output: so , if we make no changes today , what does tomorrow look like ? well , i think you probably already have the picture . t xt yt 1 so O 2 if ,COMMA 3 we O 4 make O 5 no O 6 changes O 7 today O 8 what ,COMMA 9 does O 10 tomorrow O 11 look O t xt yt 12 like O 13 well ?QMARK 14 i ,COMMA 15 think O 16 you O 17 probably O 18 already O 19 have O 20 the O 21 picture O 22 </s> .PERIOD Related Work Multiple fully-connected hidden layers [1] Convolutional neural networks (CNNs) [2] Recurrent neural networks (RNNs) [4] RNNs with neural attention mechanisms [3] Main Contributions Context encoding from the stacked recurrent layers to learn more hierarchical aspects Neural attentions applied on every intermediate hidden layer to capture the layer-wise features Multi-head attentions instead of a single attention function to diversify the layer-wise attentions further Methods Model Architecture yt st-1st-1 stst ∙∙∙∙∙∙ n-bidirectional recurrent layers AttentionAttention AttentionAttention AttentionAttention m-heads mm mm Attention Attention Attention m-heads m m x1 ∙∙∙h1 1h1 1 h2 1h2 1 hn 1hn 1x1 ∙∙∙h1 1 h2 1 hn 1 xT ∙∙∙h1 Th1 T h2 Th2 T hn Thn TxT ∙∙∙h1 T h2 T hn T h2 th2 txt ∙∙∙h1 th1 t hn thn th2 txt ∙∙∙h1 t hn t ∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ ∙∙∙ Word Embedding Each word xt in a given input sequence X is represented as a d-dimensional distributed vector xt ∈ Rd Word embeddings can be learned from scratch with random initialization or fine-tuned from pre-trained word vectors Bi-directional Recurrent Layers Stacking multiple recurrent layers on top of each other −→ h i t =    GRU xt, −→ h 1 t−1 if i = 1, GRU hi−1 t , −→ h i t−1 if i = 2, · · · n, The backward state ←− h i t is computed also with the same way but in the reverse order from the end of the sequence T to t. Both directional states are concatenated into the output state hi t = −→ h i t, ←− h i t ∈ R2d to represent both contexts together Uni-directional Recurrent Layer The top-layer outputs [hn 1, · · · , hn T ] are forwarded to a uni-directional recurrent layer st = GRU (hn t , st−1) ∈ Rd , Multi-head Attentions The hidden state st constitutes a query to neural attentions based on scaled dot-product attention Attn (Q, K, V) = softmax QKT √ d V, Multi-head attentions are applied for every layer separately to compute the weighted contexts from different subspaces fi,j = Attn S · Wi,j Q , Hi · Wi,j K , Hi · Wi,j V , where S = [s1; s2; · · · ; sT ] and Hi = hi 1; hi 2; · · · ; hi T Softmax Classification yt = softmax st, f1,1 t , · · · , fn,m t Wy + by Experimental Setup IWSLT dataset English reference transcripts of TED Talks 2.1M, 296k, and 13k words for training, development, and test on reference, respectively Model Configurations Number of recurrent layers: n ∈ {1, 2, 3, 4} Number of attention heads: m ∈ {1, 2, 3, 4, 5} Attended contexts: layer-wise or only on top Implementation Details Word embeddings: 50d GloVe on Wikipedia Optimizer: Adam optimizer Loss: Negative log likelihood loss Hidden dimension d = 256 The same dimension across all the components Except the word embedding layer Mini-batch size: 128 Dropout rate: 0.5 Early stopping within 100 epochs Results Comparisons of the performances with different network configurations Single-head Top-layer Single-head Layer-wise Multi-head Top-layer Multi-head Layer-wise 0.62 0.63 0.64 0.65 0.66 0.67 0.68 F-measure 1 layer 2 layers 3 layers 4 layers Results COMMA PERIOD QUESTION OVERALL Models P R F P R F P R F P R F SER DNN-A [1] 48.6 42.4 45.3 59.7 68.3 63.7 - - - 54.8 53.6 54.2 66.9 CNN-2A [1] 48.1 44.5 46.2 57.6 69.0 62.8 - - - 53.4 55.0 54.2 68.0 T-LSTM [2] 49.6 41.4 45.1 60.2 53.4 56.6 57.1 43.5 49.4 55.0 47.2 50.8 74.0 T-BRNN [3] 64.4 45.2 53.1 72.3 71.5 71.9 67.5 58.7 62.8 68.9 58.1 63.1 51.3 T-BRNN-pre [3] 65.5 47.1 54.8 73.3 72.5 72.9 70.7 63.0 66.7 70.0 59.7 64.4 49.7 Single-BiRNN [4] 62.2 47.7 54.0 74.6 72.1 73.4 67.5 52.9 59.3 69.2 59.8 64.2 51.1 Corr-BiRNN [4] 60.9 52.4 56.4 75.3 70.8 73.0 70.7 56.9 63.0 68.6 61.6 64.9 50.8 DRNN-LWMA 63.4 55.7 59.3 76.0 73.5 74.7 75.0 71.7 73.3 70.0 64.6 67.2 47.3 DRNN-LWMA-pre 62.9 60.8 61.9 77.3 73.7 75.5 69.6 69.6 69.6 69.9 67.2 68.6 46.1 so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture so if wemake nochangestodaywhatdoestomorrow look likewell ithink youprobablyalreadyhave thepicture so if we make no changes today what does tomorrow look like well i think you probably already have the picture References [1] Xiaoyin Che, ChengWang, Haojin Yang, and Christoph Meinel, ‘Punctuation prediction for unsegmented transcript based on word vector’, LREC 2016. [2] Ottokar Tilk and Tanel Alumae, ‘LSTM for punctuation restoration in speech transcripts’, INTERSPEECH 2015. [3] Ottokar Tilk and Tanel Alumae, ‘Bidirectional recurrent neural network with attention mechanism for punctuation restoration’, INTERSPEECH 2016. [4] Vardaan Pahuja et al., ‘Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks’, INTERSPEECH 2017. 345 Park Avenue, San Jose, CA 95110, USA Email: seokim@adobe.com