SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Attentional Parallel RNNs for
Generating Punctuation in
Transcribed Speech
Alp Öktem, Mireia Farrús, Leo Wanner
E-mail: alp.oktem@upf.edu
Other works: https://www.researchgate.net/profile/Alp_Oktem
Github: https://github.com/alpoktem

Contents
1) Motivation
2) Punctuating spoken text
3) Approaches
a) Related Work
b) Our approach
4) Proposed model
5) Data and experimental setup
6) Results
7) Contributions

Motivation
...
so under that basis we put it out and said
look we're skeptical about this thing we
don't know but what can we do the
material looks good it feels right but we
just can't verify it and we then got a letter
just this week from the company who
wrote it wanting to track down the source
saying hey we want to track down the
source and we were like oh tell us more
what document is it precisely you're
talking about can you show that you had
legal authority over that document is it
really yours
...
ASR

Motivation
ASR
...
So under that basis, we put it out and
said, "Look, we're skeptical about this
thing. We don't know, but what can we
do? The material looks good, it feels
right, but we just can't verify it." And we
then got a letter just this week from the
company who wrote it, wanting to track
down the source saying, "Hey, we want
to track down the source." And we were
like, "Oh, tell us more. What document is
it, precisely, you're talking about? Can
you show that you had legal authority
over that document? Is it really yours?
...

Why punctuation?
Punctuation serves for:
● For human readability,
● To aid interpretation,
● For machine processing:
○ Parsing
○ Machine translation

Motivation
RESEARCH QUESTIONS
1. How to approach the problem of unpunctuated ASR output?
2. Which linguistic phenomena affect the placement of
punctuation marks in spoken text?

Punctuating Spoken Text
What signals punctuation in speech?
1) Syntax/Orthography:
Usage of commas, which are required e.g. in seperating clauses, depend a lot on
syntax.
Today, I am giving a talk.

Punctuating Spoken Text
What signals punctuation in speech?
2) Prosody:

Related Work
❖ Data-driven models → Trainable on any language
❖ Recurrent Neural Networks (RNN) employed on two kinds of data:
Written Data
Features: Lexical, POS
Written+Spoken Data
Features: Lexical, pause
durations
Training in two stages
(Ballesteros et al., 2016)
Many prosodic features contributing to punctuation usage
are neglected!
(Tilk et al., 2016)

Our Approach
❖ Process lexical and prosodic information in parallel.
❖ Train a model solely from spoken data
❖ Test various acoustic features contributing to prosody:
➢ Pause durations
➢ Fundemental frequency (f0)
➢ Intensity

Proposed Model
(Bahdanau et al.)

Data
❖ 1046 TED Talks
❖ 884 English speakers
❖ 156034 sentences
❖ Manual transcription available
https://www.ted.com/talks

Acoustic/Prosodic Features
0,12 s
pause
duration
mean f0
mean
intensity
range
features
(max - min)
Measurements in semitones relative to speaker mean

Experimental Setup
❖ Reduced punctuation set
❖ 50 words per training sample
❖ 59811 samples
❖ %70-%15-%15: Training,
testing, validation
❖ Word vocabulary: 13830
❖ Implementation using Theano
no
punctuation

Results for Each Punctuation Mark

Results from Testing Set
julian _ welcome . it's _ been _ reported _ that _ wikileaks _ your _ baby _ has _ in _
the _ last _ few _ years _ has _ released _ more _ classified _ documents _ than _ the
_ rest _ of _ the _ world's _ media _ combined . can _ that _ possibly _ be _ true ?
yeah , can _ it _ possibly _ be _ true ? it's _ a _ worry . isn't _ it _ that _ the _ rest _ of _
the _ world's _ media _ is _ doing _ such _ a _ bad _ job _ that _ a _ little _ group _ of
_ activists _ is _ able _ to _ release _ more _ of _ that _ type _ of _ information _ than _
the _ rest _ of _ the _ world _ press _ combined . how _ does _ it _ work ? how _ do _
people _ release _ the _ documents ?
who _ was _ the _ richest _ man ? still _ is _ the _ richest _ man _ in _ kenya .
when _ we _ released _ that _ report , we _ did _ so _ three _ days _ after _ the _ new
_ president _ kibaki _ had _ decided _ to _ pal _ up _ with _ the _ man _ that _ he _
was _ going _ to _ clean _ out , daniel _ arap _ moi .

Contributions
❖ A study on the effect of various acoustic features on
punctuating spoken text.
❖ A model that is able to...
➢ process lexical/prosodic features in parallel
➢ integrate any aligned feature
❖ Training solely on spoken data
❖ Improvement compared to baseline (+%9,1 in terms of
F1
-score)
Source code available at:
https://github.com/alpoktem/punkProse

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Similar to SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech (20)

Recently uploaded

Recently uploaded (20)

SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech