SlideShare a Scribd company logo
The 6th Workshop on Disfluency in Spontaneous Speech
Stockholm, Sweden August 21-23, 2013
Sara Candeias 1
Dirce Celorico 1
Jorge Proença 1
Arlindo Veiga 1,2
Fernando Perdigão 1,2
1Instituto de Telecomunicações, Coimbra, Portugal
2University of Coimbra, DEEC, Portugal
HESITA(tions) in Portuguese
Database
2
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
3
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
4
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 LINGUISTIC or CLINICAL/THERAPEUTIC areas
 more directly interested in gathering knowledge for better
identifying salient information in human speech
communication
Various scientific domains can beneficiate of the analysis of the
hesitation distribution along the speech:
SCOPE
5
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 LINGUISTIC or CLINICAL/THERAPEUTIC areas
 more directly interested in gathering knowledge for better
identifying salient information in human speech
communication
Various scientific domains can beneficiate of the analysis of the
hesitation distribution along the speech:
SCOPE
 SPEECH TECHNOLOGY
 to increase the usability of speech systems, by overpassing
the challenges proposed by the presence of such
phenomena.
6
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
AUTOMATIC LANGUAGE PROCESSING
 could benefit from a richer representation of the audio signal that
incorporates speaking styles information (hesitations),
 to reduce errors in the automatic speech recognition,
 to improve automatic conversational speech systems.
DETECTION OF HESITATION EVENTS
 provides the segmentation of multimedia data into consistent
parts,
 leads to important applications : identification of the speech
segments to train acoustic models for speech recognition in
spontaneous speech.
SCOPE
7
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
SCOPE
No database of hesitation events for
European Portuguese is freely
available so far !
8
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
9
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Database for European Portuguese,
 mainly focused on the hesitation events,
 containing a large and rich variety of speech data events.
GOAL
HESITA database
Available through:
 Meta-Net: http://metanet4u.l2f.inesc-id.pt/repository/search/
 Project page: http://lsi.co.it.pt/spl/hesitation/downloads.html
10
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
11
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
 30 daily news programs
 collected from podcasts of a European Portuguese television channel
 ~ 27 hours of speech
 audio downsampled from 44.1 kHz to 16 kHz sampling rate,
 video information discarded,
 studio and out of studio recordings, some telephone sessions.
DESCRIPTION OF THE HESITA DATABASE
12
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
 prepared (read) speaking style is dominant:
 most of the speech encompasses utterances of anchors and
professional speakers (14 hours),
 spontaneous speech segments present:
 in commentators, reporters, interviewers and interviewees (10
hours),
 Lombard speech appears with low representativeness (18 minutes).
DESCRIPTION OF THE HESITA DATABASE
13
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
Manually identified and annotated hesitation events:
DESCRIPTION OF THE HESITA DATABASE
patterns closely following the notation presented in E. Shriberg
14
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
Manually identified and annotated hesitation events:
DESCRIPTION OF THE HESITA DATABASE
 repetitions (r),
 substitutions (s),
 filler words (p),
 deletions (d) and
 insertions (i).
 Only the speech segments were annotated in terms of hesitations,
 Filled pause vocalizations were transcribed using the SAMPA phonetic
alphabet for European Portuguese.
15
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
Annotation encompasses information regarding to:
 audio characteristics - background environments:
 studio, street, speech overlapping, noise and music,
DESCRIPTION OF THE HESITA DATABASE
 acoustic events - non-speech events:
 music, jingles, laughter, coughing or clapping.
 respiratory and other events:
 noise from cars or wind,
 speaking style and speaker information.
16
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
All the annotations were performed by using the Transcriber software tool.
DESCRIPTION OF THE HESITA DATABASE
17
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
DESCRIPTION OF THE HESITA DATABASE
SP_STU_E3_M represents:
 an annotation of speech segment (SP),
 in a noise-free environment, studio, (STU),
 with high level of spontaneity (E3), and
 from a male speaker (M).
18
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
DESCRIPTION OF THE HESITA DATABASE
SP_OVR_E3_M represents:
 the annotation of an speech segment (SP)
 with overlapping speech (OVR),
 in a spontaneous speaking style with high level of
spontaneity (E3), and
 from a male speaker (M).
19
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
DESCRIPTION OF THE HESITA DATABASE
 (r.r) - repetitions (r),
 (.w+) - extensions within a word (w+)
 (f.) - filled pauses (f).
 [6~]: (f.) - phonetic symbols attest extended vowel
sounds or vocalic fillers.
 res - presence of a respiratory event.
20
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
21
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Considering the segments annotated accordingly to the presence of hesitations,
we can see how the hesitation patterns are distributed.
HESITATION PATTERNS
Top 10 most frequent hesitation patterns.
 total of 4608 events observed,
 filled pauses (f.) and vocalic extensions within
a word (.w+) are the most common.
22
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Pattern models display the way that the hesitation occurs, indicating the order of
the words before and after the so-called “repair-point”.
HESITATION PATTERNS
Top 10 most frequent hesitation patterns.
 The repair point (as in (f.)) marks the place from
which the hesitation is repaired and the fluency
is restored.
 pattern (r.r) indicates that a word r was
repeated as repair or reinforcement ("de.de");
 pattern (s-.s), the word s was cut and then
substituted ("qua-.quantas");
 in (r2.r) the same word r was repeated twice
and finally restored ("com.com.com");
 in (rs-.rs) the word r was repeated and word s
was cut and, then substituted with correction
("da tu-.da totalidade").
23
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
More complex hesitation patterns are present…
HESITATION PATTERNS
Embedded hesitations:
"que vo-.que.que.que voltam.que.que possam"
" that re-.that. that. that return. that. that could "
( ( r s- .(r 2 . r) s) . ( r . r ) s).
24
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
25
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
In general hesitation events occur mainly in spontaneous speech
 4406 against 188 in read (prepared) speech and 12 in Lombard speech,
 total of 188 hesitations observed in 14 hours for read (prepared) speaking
style results in a rate of 0.22 hesitations per minute,
 4406 hesitation events in 10 hours of spontaneous speech result in a rate of
7.34 hesitations per minute.
HESITATIONS ACROSS SPEAKING STYLES
The density of hesitations in speech varies with the speaking style
26
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITATIONS ACROSS SPEAKING STYLES
Distribution of the 5 most common hesitation patterns in the read (prepared) speech
 High frequency of vocalic extensions (.w+) (39.36%)
just followed by filled pauses (f.) (32.45%).
Top 5 most frequent hesitation
patterns for read (prepared) speech.
Although the difference between those two occurrences is
not so expressive, it is possible that the choice for the
extensions reflects the fact that vocalic fillers tend to be
more stigmatized in a prepared speech context.
27
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITATIONS ACROSS SPEAKING STYLES
Distribution of the 5 most common hesitation patterns in the read (prepared) speech
 High frequency of vocalic extensions (.w+) (39.36%)
just followed by filled pauses (f.) (32.45%).
Top 5 most frequent hesitation
patterns for read (prepared) speech.
 Repetitions in read or prepared speech become
residual.
 The occurrence of substitutions are higher in the
prepared speech than in spontaneous speech (9.57%
vs. 3.61%).
„proving‟ that they are more adequate for communicative
strategy mainly in what the fluency of speaking is
concerned.
28
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
29
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
PHONETIC FORM OF FILLED PAUSES
 The two most common phonetic forms for
filled pauses:
 the near-open central vowel [ɐ] ([6] in
SAMPA),
 the mid-central vowel [ə] ([@] in SAMPA).
Phone distribution of filled pauses
(top10 most frequent).
This distribution supports the view that the
vocalizations preferred by Portuguese speakers
are around central vowels, corresponding to the
reduced vowels in an unstressed position.
30
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
PHONETIC FORM OF FILLED PAUSES
 The two most common phonetic forms for
filled pauses:
 the near-open central vowel [ɐ] ([6] in
SAMPA),
 the mid-central vowel [ə] ([@] in SAMPA).
Phone distribution of filled pauses
(top10 most frequent).
 Slight
] as well (around 3%).
 A nasal preference is also evident : see [ɐ], [ɐm]
and [ m ].
Our point here is not to associate a meaning to the filler sounds. However, there is
strong empirical evidence that speakers use all of them for playing a structuring
role in the speech.
31
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
PHONETIC FORM OF FILLED PAUSES
 The two most common phonetic forms for
filled pauses:
 the near-open central vowel [ɐ] ([6] in
SAMPA),
 the mid-central vowel [ə] ([@] in SAMPA).
Phone distribution of filled pauses
(top10 most frequent).
 Slight
] as well (around 3%).
 A nasal preference is also evident : see [ɐ], [ɐm]
and [ m ].
The choice for a vocalic sound rather than other appears to be, at least in some
contexts, motivated by the behavior of neighbor phonetic segments, neutralizing in
some way the phonetic difference of the vocalic fillers.
32
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
33
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Annotation of patterns closely follows E. Shriberg methodology,
 Encompasses the initial and final temporal marks,
 Corresponding label contains the pattern and the orthographic transcription,
 Repair-point marked temporally, showing the instant where the hesitation is
corrected and when the fluency on speech is recovered.
SEGMENTATION OF HESITATIONS
The period of time that corresponds to the beginning of the hesitation to its repair-
point is much larger (0.61 seconds in average) than the period of time between the
repair point and the end of the hesitation correction (0.34 seconds in average).
These trends concerning the distribution and duration of hesitation events may be
analyzed as manifestations of planning effort as well.
34
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
 Scope
 Goal
 Description of the HESITA Database
 Hesitation Patterns
 Hesitations across speaking styles
 Phonetic form of filled pauses
 Segmentation of hesitations
 Technical Information
 Future
SUMMARY
35
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Directories and files:
The archive that can be uploaded contains 58 audio files and the
corresponding TRS files, that enclose the two parts of the 30 daily
newsprograms.
Data structure of an entry:
The TRS files have a data type definition file associated: trans-14.dtd that
is provided in the archive.
TECHNICAL INFORMATION
Corpora size:
TRS files have a total of 4608 hesitation events.
The whole resource occupies 3GB, mainly due to the audio files.
36
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Thank You
FUTURE...
We really expect that this database can be a relevant base of work for
further studies regarding a variety of speech phenomena.
The 6th Workshop on Disfluency in Spontaneous Speech
Stockholm, Sweden August 21-23, 2013
Sara Candeias 1
(saracandeias@co.it.pt)
Dirce Celorico 1
Jorge Proença 1
Arlindo Veiga 1,2
Fernando Perdigão 1,2
1Instituto de Telecomunicações, Coimbra, Portugal
2University of Coimbra, DEEC, Portugal
HESITA(tions) in Portuguese
Database

More Related Content

Similar to Di ss2013 hesita_presentation_final

Di ss2013 fillerspt_presentation_final
Di ss2013 fillerspt_presentation_finalDi ss2013 fillerspt_presentation_final
Di ss2013 fillerspt_presentation_finalSara Candeias
 
Disntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina ForascuDisntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina Forascu
oxwocs
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech Synthesizer
IJESM JOURNAL
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech Synthesizer
IJESM JOURNAL
 
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...
cscpconf
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
IJECEIAES
 
B110512
B110512B110512
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
Raj Wali Khan
 
4080 15688-1-pb (1)
4080 15688-1-pb (1)4080 15688-1-pb (1)
4080 15688-1-pb (1)raja asad
 
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...
Thierry Chanier
 
Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)
Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)
Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)
Amy Roman
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
International Journal of Science and Research (IJSR)
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Guy De Pauw
 

Similar to Di ss2013 hesita_presentation_final (15)

Di ss2013 fillerspt_presentation_final
Di ss2013 fillerspt_presentation_finalDi ss2013 fillerspt_presentation_final
Di ss2013 fillerspt_presentation_final
 
Disntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina ForascuDisntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina Forascu
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech Synthesizer
 
Vocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech SynthesizerVocal Translation For Muteness People Using Speech Synthesizer
Vocal Translation For Muteness People Using Speech Synthesizer
 
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
 
B110512
B110512B110512
B110512
 
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual DictionariesOpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
 
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
 
532_Paper
532_Paper532_Paper
532_Paper
 
4080 15688-1-pb (1)
4080 15688-1-pb (1)4080 15688-1-pb (1)
4080 15688-1-pb (1)
 
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...
 
Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)
Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)
Aphesis And Aphaeresis In Late Modern English Dialects (Based On EDD Online)
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
 

Recently uploaded

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 

Recently uploaded (20)

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 

Di ss2013 hesita_presentation_final

  • 1. The 6th Workshop on Disfluency in Spontaneous Speech Stockholm, Sweden August 21-23, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo Veiga 1,2 Fernando Perdigão 1,2 1Instituto de Telecomunicações, Coimbra, Portugal 2University of Coimbra, DEEC, Portugal HESITA(tions) in Portuguese Database
  • 2. 2 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 3. 3 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 4. 4 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  LINGUISTIC or CLINICAL/THERAPEUTIC areas  more directly interested in gathering knowledge for better identifying salient information in human speech communication Various scientific domains can beneficiate of the analysis of the hesitation distribution along the speech: SCOPE
  • 5. 5 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  LINGUISTIC or CLINICAL/THERAPEUTIC areas  more directly interested in gathering knowledge for better identifying salient information in human speech communication Various scientific domains can beneficiate of the analysis of the hesitation distribution along the speech: SCOPE  SPEECH TECHNOLOGY  to increase the usability of speech systems, by overpassing the challenges proposed by the presence of such phenomena.
  • 6. 6 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 AUTOMATIC LANGUAGE PROCESSING  could benefit from a richer representation of the audio signal that incorporates speaking styles information (hesitations),  to reduce errors in the automatic speech recognition,  to improve automatic conversational speech systems. DETECTION OF HESITATION EVENTS  provides the segmentation of multimedia data into consistent parts,  leads to important applications : identification of the speech segments to train acoustic models for speech recognition in spontaneous speech. SCOPE
  • 7. 7 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 SCOPE No database of hesitation events for European Portuguese is freely available so far !
  • 8. 8 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 9. 9 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 Database for European Portuguese,  mainly focused on the hesitation events,  containing a large and rich variety of speech data events. GOAL HESITA database Available through:  Meta-Net: http://metanet4u.l2f.inesc-id.pt/repository/search/  Project page: http://lsi.co.it.pt/spl/hesitation/downloads.html
  • 10. 10 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 11. 11 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITA Database  30 daily news programs  collected from podcasts of a European Portuguese television channel  ~ 27 hours of speech  audio downsampled from 44.1 kHz to 16 kHz sampling rate,  video information discarded,  studio and out of studio recordings, some telephone sessions. DESCRIPTION OF THE HESITA DATABASE
  • 12. 12 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITA Database  prepared (read) speaking style is dominant:  most of the speech encompasses utterances of anchors and professional speakers (14 hours),  spontaneous speech segments present:  in commentators, reporters, interviewers and interviewees (10 hours),  Lombard speech appears with low representativeness (18 minutes). DESCRIPTION OF THE HESITA DATABASE
  • 13. 13 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITA Database Manually identified and annotated hesitation events: DESCRIPTION OF THE HESITA DATABASE patterns closely following the notation presented in E. Shriberg
  • 14. 14 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITA Database Manually identified and annotated hesitation events: DESCRIPTION OF THE HESITA DATABASE  repetitions (r),  substitutions (s),  filler words (p),  deletions (d) and  insertions (i).  Only the speech segments were annotated in terms of hesitations,  Filled pause vocalizations were transcribed using the SAMPA phonetic alphabet for European Portuguese.
  • 15. 15 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITA Database Annotation encompasses information regarding to:  audio characteristics - background environments:  studio, street, speech overlapping, noise and music, DESCRIPTION OF THE HESITA DATABASE  acoustic events - non-speech events:  music, jingles, laughter, coughing or clapping.  respiratory and other events:  noise from cars or wind,  speaking style and speaker information.
  • 16. 16 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITA Database All the annotations were performed by using the Transcriber software tool. DESCRIPTION OF THE HESITA DATABASE
  • 17. 17 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 DESCRIPTION OF THE HESITA DATABASE SP_STU_E3_M represents:  an annotation of speech segment (SP),  in a noise-free environment, studio, (STU),  with high level of spontaneity (E3), and  from a male speaker (M).
  • 18. 18 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 DESCRIPTION OF THE HESITA DATABASE SP_OVR_E3_M represents:  the annotation of an speech segment (SP)  with overlapping speech (OVR),  in a spontaneous speaking style with high level of spontaneity (E3), and  from a male speaker (M).
  • 19. 19 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 DESCRIPTION OF THE HESITA DATABASE  (r.r) - repetitions (r),  (.w+) - extensions within a word (w+)  (f.) - filled pauses (f).  [6~]: (f.) - phonetic symbols attest extended vowel sounds or vocalic fillers.  res - presence of a respiratory event.
  • 20. 20 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 21. 21 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 Considering the segments annotated accordingly to the presence of hesitations, we can see how the hesitation patterns are distributed. HESITATION PATTERNS Top 10 most frequent hesitation patterns.  total of 4608 events observed,  filled pauses (f.) and vocalic extensions within a word (.w+) are the most common.
  • 22. 22 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 Pattern models display the way that the hesitation occurs, indicating the order of the words before and after the so-called “repair-point”. HESITATION PATTERNS Top 10 most frequent hesitation patterns.  The repair point (as in (f.)) marks the place from which the hesitation is repaired and the fluency is restored.  pattern (r.r) indicates that a word r was repeated as repair or reinforcement ("de.de");  pattern (s-.s), the word s was cut and then substituted ("qua-.quantas");  in (r2.r) the same word r was repeated twice and finally restored ("com.com.com");  in (rs-.rs) the word r was repeated and word s was cut and, then substituted with correction ("da tu-.da totalidade").
  • 23. 23 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 More complex hesitation patterns are present… HESITATION PATTERNS Embedded hesitations: "que vo-.que.que.que voltam.que.que possam" " that re-.that. that. that return. that. that could " ( ( r s- .(r 2 . r) s) . ( r . r ) s).
  • 24. 24 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 25. 25 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 In general hesitation events occur mainly in spontaneous speech  4406 against 188 in read (prepared) speech and 12 in Lombard speech,  total of 188 hesitations observed in 14 hours for read (prepared) speaking style results in a rate of 0.22 hesitations per minute,  4406 hesitation events in 10 hours of spontaneous speech result in a rate of 7.34 hesitations per minute. HESITATIONS ACROSS SPEAKING STYLES The density of hesitations in speech varies with the speaking style
  • 26. 26 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITATIONS ACROSS SPEAKING STYLES Distribution of the 5 most common hesitation patterns in the read (prepared) speech  High frequency of vocalic extensions (.w+) (39.36%) just followed by filled pauses (f.) (32.45%). Top 5 most frequent hesitation patterns for read (prepared) speech. Although the difference between those two occurrences is not so expressive, it is possible that the choice for the extensions reflects the fact that vocalic fillers tend to be more stigmatized in a prepared speech context.
  • 27. 27 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 HESITATIONS ACROSS SPEAKING STYLES Distribution of the 5 most common hesitation patterns in the read (prepared) speech  High frequency of vocalic extensions (.w+) (39.36%) just followed by filled pauses (f.) (32.45%). Top 5 most frequent hesitation patterns for read (prepared) speech.  Repetitions in read or prepared speech become residual.  The occurrence of substitutions are higher in the prepared speech than in spontaneous speech (9.57% vs. 3.61%). „proving‟ that they are more adequate for communicative strategy mainly in what the fluency of speaking is concerned.
  • 28. 28 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 29. 29 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 PHONETIC FORM OF FILLED PAUSES  The two most common phonetic forms for filled pauses:  the near-open central vowel [ɐ] ([6] in SAMPA),  the mid-central vowel [ə] ([@] in SAMPA). Phone distribution of filled pauses (top10 most frequent). This distribution supports the view that the vocalizations preferred by Portuguese speakers are around central vowels, corresponding to the reduced vowels in an unstressed position.
  • 30. 30 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 PHONETIC FORM OF FILLED PAUSES  The two most common phonetic forms for filled pauses:  the near-open central vowel [ɐ] ([6] in SAMPA),  the mid-central vowel [ə] ([@] in SAMPA). Phone distribution of filled pauses (top10 most frequent).  Slight ] as well (around 3%).  A nasal preference is also evident : see [ɐ], [ɐm] and [ m ]. Our point here is not to associate a meaning to the filler sounds. However, there is strong empirical evidence that speakers use all of them for playing a structuring role in the speech.
  • 31. 31 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 PHONETIC FORM OF FILLED PAUSES  The two most common phonetic forms for filled pauses:  the near-open central vowel [ɐ] ([6] in SAMPA),  the mid-central vowel [ə] ([@] in SAMPA). Phone distribution of filled pauses (top10 most frequent).  Slight ] as well (around 3%).  A nasal preference is also evident : see [ɐ], [ɐm] and [ m ]. The choice for a vocalic sound rather than other appears to be, at least in some contexts, motivated by the behavior of neighbor phonetic segments, neutralizing in some way the phonetic difference of the vocalic fillers.
  • 32. 32 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 33. 33 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Annotation of patterns closely follows E. Shriberg methodology,  Encompasses the initial and final temporal marks,  Corresponding label contains the pattern and the orthographic transcription,  Repair-point marked temporally, showing the instant where the hesitation is corrected and when the fluency on speech is recovered. SEGMENTATION OF HESITATIONS The period of time that corresponds to the beginning of the hesitation to its repair- point is much larger (0.61 seconds in average) than the period of time between the repair point and the end of the hesitation correction (0.34 seconds in average). These trends concerning the distribution and duration of hesitation events may be analyzed as manifestations of planning effort as well.
  • 34. 34 DiSS 2013 Stockholm, Sweden - August 21-23, 2013  Scope  Goal  Description of the HESITA Database  Hesitation Patterns  Hesitations across speaking styles  Phonetic form of filled pauses  Segmentation of hesitations  Technical Information  Future SUMMARY
  • 35. 35 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 Directories and files: The archive that can be uploaded contains 58 audio files and the corresponding TRS files, that enclose the two parts of the 30 daily newsprograms. Data structure of an entry: The TRS files have a data type definition file associated: trans-14.dtd that is provided in the archive. TECHNICAL INFORMATION Corpora size: TRS files have a total of 4608 hesitation events. The whole resource occupies 3GB, mainly due to the audio files.
  • 36. 36 DiSS 2013 Stockholm, Sweden - August 21-23, 2013 Thank You FUTURE... We really expect that this database can be a relevant base of work for further studies regarding a variety of speech phenomena.
  • 37. The 6th Workshop on Disfluency in Spontaneous Speech Stockholm, Sweden August 21-23, 2013 Sara Candeias 1 (saracandeias@co.it.pt) Dirce Celorico 1 Jorge Proença 1 Arlindo Veiga 1,2 Fernando Perdigão 1,2 1Instituto de Telecomunicações, Coimbra, Portugal 2University of Coimbra, DEEC, Portugal HESITA(tions) in Portuguese Database

Editor's Notes

  1. , which reveals a tendency in fluency also verified for other languages