The document describes the HESITA database, a database of hesitation events in European Portuguese speech. The database contains over 27 hours of speech data annotated with hesitation patterns, including filled pauses, repetitions, substitutions, deletions and insertions. Hesitation events were found to be much more common in spontaneous speech than prepared speech. The two most frequent phonetic forms of filled pauses were near-open and mid-central vowels. Segmentation of hesitations showed longer durations from the start of the hesitation to the repair point compared to from the repair point to the end. The database is intended to help research in areas like speech technology and automatic language processing.
Why are languages and language technologies important in our societies?
How to deal with a less-studied language?
How to build and exploit new language resources?
How much time is needed? How to represent and use time (temporal information in NLP applications)?
How to efficiently use time in research and… personal life?...
These are questions to be answered having the main focus on Romanian.
Vocal Translation For Muteness People Using Speech SynthesizerIJESM JOURNAL
The research perform has enabled a mute man can speak without surgery. An electrode placed on the neck to get the vibration from blabbering voice of the person and also implement the special speech synthesizer for producing him vowels. It possible for the disable person to produce vowels by thinking of them, using a speech synthesizer. In the future, this breakthrough may help erase the word of speech disability.
Vocal Translation For Muteness People Using Speech SynthesizerIJESM JOURNAL
The research perform has enabled a mute man can speak without surgery. An electrode placed on the neck to get the vibration from blabbering voice of the person and also implement the special speech synthesizer for producing him vowels. It possible for the disable person to produce vowels by thinking of them, using a speech synthesizer. In the future, this breakthrough may help erase the word of speech disability.
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...cscpconf
Signal processing in current days is under studying. One of these studies focuses on speech processing. Speech signal have many important features. One of them is Voice Onset Time (VOT). This feature only appears in stop sounds. The human auditory system can utilize the VOT to differentiate between voiced and unvoiced stops like /p/ and /b/ in the English language. By VOT feature we can classify and detect languages and dialects. The main reason behind choosing this subject is that the researches in analyzing Arabic language in this field are not enough and automatic detection of VOT value in Modern Standard Arabic (MSA) is a new idea.
In this paper, we will focus on designing an algorithm that will be used to detect the VOT value in MSA language automatically depending on the power signal. We apply this algorithm only on the voiced stop sounds /b/, /d/ and /d?/, and compare that VOT values automatically generated by the algorithm with the manual values calculated by reading the spectrogram. We created the corpus, and used CV-CV-CV format for each word, the target stop consonant is in the middle of word. The algorithm resulted in a high accuracy, and the error rate was 0.80%, 26.62% and 11.71% for the three stop voiced sounds /d/, /d? / and /b/ respectively . The standard deviation was low in /d/ sound because it is easy to pronounce, and high in /d?/ sound because it is uniqueand difficult to pronounce.
Recent advances in LVCSR : A benchmark comparison of performancesIJECEIAES
Large Vocabulary Continuous Speech Recognition (LVCSR), which is characterized by a high variability of the speech, is the most challenging task in automatic speech recognition (ASR). Believing that the evaluation of ASR systems on relevant and common speech corpora is one of the key factors that help accelerating research, we present, in this paper, a benchmark comparison of the performances of the current state-of-the-art LVCSR systems over different speech recognition tasks. Furthermore, we put objectively into evidence the best performing technologies and the best accuracy achieved so far in each task. The benchmarks have shown that the Deep Neural Networks and Convolutional Neural Networks have proven their efficiency on several LVCSR tasks by outperforming the traditional Hidden Markov Models and Guaussian Mixture Models. They have also shown that despite the satisfying performances in some LVCSR tasks, the problem of large-vocabulary speech recognition is far from being solved in some others, where more research efforts are still needed.
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...Thierry Chanier
The term "Digital Humanities" (DH) received much attention at the MLA (Modern Language Association) convention in 2009. The term is now in widespread use within the Humanities. CALL may be directly concerned: our field belongs to the Humanities and, from the outset, we have had a strong interest in computers and computing. Although various meanings and interpretations can be attributed to the term DH, this presentation will address issues related to ways of promoting CALL research in order to meet what may soon become research standards within the Humanities.
Starting with a historical overview of the release of research results, i.e. in academic journals, we will examine whether CALL encourages multilingual publications. We will then turn to links between journals and research data. We will consider the position of several disciplines (including linguistics) regarding ways to enhance replicability by linking research results and researcher data, increasing the visibility and credibility of research.
Another move towards enhancing the quality of CALL research may be to collect, organize and share data stemming from learning situations in such a way that analyses can be clearly and overtly processed and discussed in our community. With this in mind, we will introduce the notion of Learning and Teaching Corpora (LETEC), and illustrate this methodology with data from online multimodal interactions. Beyond CALL research issues, such data may have different applications, both within the area of teacher-training (examples of Pedagogical Corpora will be given) and the general field of linguistics. Finally we will examine how sustained access to research results (articles and data) can be provided in open-access formats and criteria the CALL field will need to meet to become compliant with the so-called "OpenData".
Accents of English have been investigated for many years both from the perspective of native and non-native speakers of the language. Various research results imply that non-native speakers of English language produce certain speech characteristics which are uncommon in native speakers’ speech. This is because non-native speakers do not produce the same tongue movement as native speakers. This paper presents an isolated English word recognition system devised with the speech of local Bangladeshi people, who are also non-native speakers of English language. Here, we have also noticed a different speech characteristic which is not available within the speech of native English speakers. Two acoustic features, ‘pitch’ and ‘formants’ have been utilized to develop the system. The system is speaker-independent and stands on Template based approach. The recognition method applied here is very simple and the recognition accuracy is also very satisfactory.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Why are languages and language technologies important in our societies?
How to deal with a less-studied language?
How to build and exploit new language resources?
How much time is needed? How to represent and use time (temporal information in NLP applications)?
How to efficiently use time in research and… personal life?...
These are questions to be answered having the main focus on Romanian.
Vocal Translation For Muteness People Using Speech SynthesizerIJESM JOURNAL
The research perform has enabled a mute man can speak without surgery. An electrode placed on the neck to get the vibration from blabbering voice of the person and also implement the special speech synthesizer for producing him vowels. It possible for the disable person to produce vowels by thinking of them, using a speech synthesizer. In the future, this breakthrough may help erase the word of speech disability.
Vocal Translation For Muteness People Using Speech SynthesizerIJESM JOURNAL
The research perform has enabled a mute man can speak without surgery. An electrode placed on the neck to get the vibration from blabbering voice of the person and also implement the special speech synthesizer for producing him vowels. It possible for the disable person to produce vowels by thinking of them, using a speech synthesizer. In the future, this breakthrough may help erase the word of speech disability.
DETECTION OF AUTOMATIC THE VOT VALUE FOR VOICED STOP SOUNDS IN MODERN STANDAR...cscpconf
Signal processing in current days is under studying. One of these studies focuses on speech processing. Speech signal have many important features. One of them is Voice Onset Time (VOT). This feature only appears in stop sounds. The human auditory system can utilize the VOT to differentiate between voiced and unvoiced stops like /p/ and /b/ in the English language. By VOT feature we can classify and detect languages and dialects. The main reason behind choosing this subject is that the researches in analyzing Arabic language in this field are not enough and automatic detection of VOT value in Modern Standard Arabic (MSA) is a new idea.
In this paper, we will focus on designing an algorithm that will be used to detect the VOT value in MSA language automatically depending on the power signal. We apply this algorithm only on the voiced stop sounds /b/, /d/ and /d?/, and compare that VOT values automatically generated by the algorithm with the manual values calculated by reading the spectrogram. We created the corpus, and used CV-CV-CV format for each word, the target stop consonant is in the middle of word. The algorithm resulted in a high accuracy, and the error rate was 0.80%, 26.62% and 11.71% for the three stop voiced sounds /d/, /d? / and /b/ respectively . The standard deviation was low in /d/ sound because it is easy to pronounce, and high in /d?/ sound because it is uniqueand difficult to pronounce.
Recent advances in LVCSR : A benchmark comparison of performancesIJECEIAES
Large Vocabulary Continuous Speech Recognition (LVCSR), which is characterized by a high variability of the speech, is the most challenging task in automatic speech recognition (ASR). Believing that the evaluation of ASR systems on relevant and common speech corpora is one of the key factors that help accelerating research, we present, in this paper, a benchmark comparison of the performances of the current state-of-the-art LVCSR systems over different speech recognition tasks. Furthermore, we put objectively into evidence the best performing technologies and the best accuracy achieved so far in each task. The benchmarks have shown that the Deep Neural Networks and Convolutional Neural Networks have proven their efficiency on several LVCSR tasks by outperforming the traditional Hidden Markov Models and Guaussian Mixture Models. They have also shown that despite the satisfying performances in some LVCSR tasks, the problem of large-vocabulary speech recognition is far from being solved in some others, where more research efforts are still needed.
Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities:...Thierry Chanier
The term "Digital Humanities" (DH) received much attention at the MLA (Modern Language Association) convention in 2009. The term is now in widespread use within the Humanities. CALL may be directly concerned: our field belongs to the Humanities and, from the outset, we have had a strong interest in computers and computing. Although various meanings and interpretations can be attributed to the term DH, this presentation will address issues related to ways of promoting CALL research in order to meet what may soon become research standards within the Humanities.
Starting with a historical overview of the release of research results, i.e. in academic journals, we will examine whether CALL encourages multilingual publications. We will then turn to links between journals and research data. We will consider the position of several disciplines (including linguistics) regarding ways to enhance replicability by linking research results and researcher data, increasing the visibility and credibility of research.
Another move towards enhancing the quality of CALL research may be to collect, organize and share data stemming from learning situations in such a way that analyses can be clearly and overtly processed and discussed in our community. With this in mind, we will introduce the notion of Learning and Teaching Corpora (LETEC), and illustrate this methodology with data from online multimodal interactions. Beyond CALL research issues, such data may have different applications, both within the area of teacher-training (examples of Pedagogical Corpora will be given) and the general field of linguistics. Finally we will examine how sustained access to research results (articles and data) can be provided in open-access formats and criteria the CALL field will need to meet to become compliant with the so-called "OpenData".
Accents of English have been investigated for many years both from the perspective of native and non-native speakers of the language. Various research results imply that non-native speakers of English language produce certain speech characteristics which are uncommon in native speakers’ speech. This is because non-native speakers do not produce the same tongue movement as native speakers. This paper presents an isolated English word recognition system devised with the speech of local Bangladeshi people, who are also non-native speakers of English language. Here, we have also noticed a different speech characteristic which is not available within the speech of native English speakers. Two acoustic features, ‘pitch’ and ‘formants’ have been utilized to develop the system. The system is speaker-independent and stands on Template based approach. The recognition method applied here is very simple and the recognition accuracy is also very satisfactory.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
1. The 6th Workshop on Disfluency in Spontaneous Speech
Stockholm, Sweden August 21-23, 2013
Sara Candeias 1
Dirce Celorico 1
Jorge Proença 1
Arlindo Veiga 1,2
Fernando Perdigão 1,2
1Instituto de Telecomunicações, Coimbra, Portugal
2University of Coimbra, DEEC, Portugal
HESITA(tions) in Portuguese
Database
2. 2
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
3. 3
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
4. 4
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
LINGUISTIC or CLINICAL/THERAPEUTIC areas
more directly interested in gathering knowledge for better
identifying salient information in human speech
communication
Various scientific domains can beneficiate of the analysis of the
hesitation distribution along the speech:
SCOPE
5. 5
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
LINGUISTIC or CLINICAL/THERAPEUTIC areas
more directly interested in gathering knowledge for better
identifying salient information in human speech
communication
Various scientific domains can beneficiate of the analysis of the
hesitation distribution along the speech:
SCOPE
SPEECH TECHNOLOGY
to increase the usability of speech systems, by overpassing
the challenges proposed by the presence of such
phenomena.
6. 6
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
AUTOMATIC LANGUAGE PROCESSING
could benefit from a richer representation of the audio signal that
incorporates speaking styles information (hesitations),
to reduce errors in the automatic speech recognition,
to improve automatic conversational speech systems.
DETECTION OF HESITATION EVENTS
provides the segmentation of multimedia data into consistent
parts,
leads to important applications : identification of the speech
segments to train acoustic models for speech recognition in
spontaneous speech.
SCOPE
7. 7
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
SCOPE
No database of hesitation events for
European Portuguese is freely
available so far !
8. 8
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
9. 9
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Database for European Portuguese,
mainly focused on the hesitation events,
containing a large and rich variety of speech data events.
GOAL
HESITA database
Available through:
Meta-Net: http://metanet4u.l2f.inesc-id.pt/repository/search/
Project page: http://lsi.co.it.pt/spl/hesitation/downloads.html
10. 10
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
11. 11
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
30 daily news programs
collected from podcasts of a European Portuguese television channel
~ 27 hours of speech
audio downsampled from 44.1 kHz to 16 kHz sampling rate,
video information discarded,
studio and out of studio recordings, some telephone sessions.
DESCRIPTION OF THE HESITA DATABASE
12. 12
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
prepared (read) speaking style is dominant:
most of the speech encompasses utterances of anchors and
professional speakers (14 hours),
spontaneous speech segments present:
in commentators, reporters, interviewers and interviewees (10
hours),
Lombard speech appears with low representativeness (18 minutes).
DESCRIPTION OF THE HESITA DATABASE
13. 13
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
Manually identified and annotated hesitation events:
DESCRIPTION OF THE HESITA DATABASE
patterns closely following the notation presented in E. Shriberg
14. 14
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
Manually identified and annotated hesitation events:
DESCRIPTION OF THE HESITA DATABASE
repetitions (r),
substitutions (s),
filler words (p),
deletions (d) and
insertions (i).
Only the speech segments were annotated in terms of hesitations,
Filled pause vocalizations were transcribed using the SAMPA phonetic
alphabet for European Portuguese.
15. 15
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
Annotation encompasses information regarding to:
audio characteristics - background environments:
studio, street, speech overlapping, noise and music,
DESCRIPTION OF THE HESITA DATABASE
acoustic events - non-speech events:
music, jingles, laughter, coughing or clapping.
respiratory and other events:
noise from cars or wind,
speaking style and speaker information.
16. 16
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITA Database
All the annotations were performed by using the Transcriber software tool.
DESCRIPTION OF THE HESITA DATABASE
17. 17
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
DESCRIPTION OF THE HESITA DATABASE
SP_STU_E3_M represents:
an annotation of speech segment (SP),
in a noise-free environment, studio, (STU),
with high level of spontaneity (E3), and
from a male speaker (M).
18. 18
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
DESCRIPTION OF THE HESITA DATABASE
SP_OVR_E3_M represents:
the annotation of an speech segment (SP)
with overlapping speech (OVR),
in a spontaneous speaking style with high level of
spontaneity (E3), and
from a male speaker (M).
19. 19
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
DESCRIPTION OF THE HESITA DATABASE
(r.r) - repetitions (r),
(.w+) - extensions within a word (w+)
(f.) - filled pauses (f).
[6~]: (f.) - phonetic symbols attest extended vowel
sounds or vocalic fillers.
res - presence of a respiratory event.
20. 20
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
21. 21
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Considering the segments annotated accordingly to the presence of hesitations,
we can see how the hesitation patterns are distributed.
HESITATION PATTERNS
Top 10 most frequent hesitation patterns.
total of 4608 events observed,
filled pauses (f.) and vocalic extensions within
a word (.w+) are the most common.
22. 22
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Pattern models display the way that the hesitation occurs, indicating the order of
the words before and after the so-called “repair-point”.
HESITATION PATTERNS
Top 10 most frequent hesitation patterns.
The repair point (as in (f.)) marks the place from
which the hesitation is repaired and the fluency
is restored.
pattern (r.r) indicates that a word r was
repeated as repair or reinforcement ("de.de");
pattern (s-.s), the word s was cut and then
substituted ("qua-.quantas");
in (r2.r) the same word r was repeated twice
and finally restored ("com.com.com");
in (rs-.rs) the word r was repeated and word s
was cut and, then substituted with correction
("da tu-.da totalidade").
23. 23
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
More complex hesitation patterns are present…
HESITATION PATTERNS
Embedded hesitations:
"que vo-.que.que.que voltam.que.que possam"
" that re-.that. that. that return. that. that could "
( ( r s- .(r 2 . r) s) . ( r . r ) s).
24. 24
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
25. 25
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
In general hesitation events occur mainly in spontaneous speech
4406 against 188 in read (prepared) speech and 12 in Lombard speech,
total of 188 hesitations observed in 14 hours for read (prepared) speaking
style results in a rate of 0.22 hesitations per minute,
4406 hesitation events in 10 hours of spontaneous speech result in a rate of
7.34 hesitations per minute.
HESITATIONS ACROSS SPEAKING STYLES
The density of hesitations in speech varies with the speaking style
26. 26
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITATIONS ACROSS SPEAKING STYLES
Distribution of the 5 most common hesitation patterns in the read (prepared) speech
High frequency of vocalic extensions (.w+) (39.36%)
just followed by filled pauses (f.) (32.45%).
Top 5 most frequent hesitation
patterns for read (prepared) speech.
Although the difference between those two occurrences is
not so expressive, it is possible that the choice for the
extensions reflects the fact that vocalic fillers tend to be
more stigmatized in a prepared speech context.
27. 27
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
HESITATIONS ACROSS SPEAKING STYLES
Distribution of the 5 most common hesitation patterns in the read (prepared) speech
High frequency of vocalic extensions (.w+) (39.36%)
just followed by filled pauses (f.) (32.45%).
Top 5 most frequent hesitation
patterns for read (prepared) speech.
Repetitions in read or prepared speech become
residual.
The occurrence of substitutions are higher in the
prepared speech than in spontaneous speech (9.57%
vs. 3.61%).
„proving‟ that they are more adequate for communicative
strategy mainly in what the fluency of speaking is
concerned.
28. 28
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
29. 29
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
PHONETIC FORM OF FILLED PAUSES
The two most common phonetic forms for
filled pauses:
the near-open central vowel [ɐ] ([6] in
SAMPA),
the mid-central vowel [ə] ([@] in SAMPA).
Phone distribution of filled pauses
(top10 most frequent).
This distribution supports the view that the
vocalizations preferred by Portuguese speakers
are around central vowels, corresponding to the
reduced vowels in an unstressed position.
30. 30
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
PHONETIC FORM OF FILLED PAUSES
The two most common phonetic forms for
filled pauses:
the near-open central vowel [ɐ] ([6] in
SAMPA),
the mid-central vowel [ə] ([@] in SAMPA).
Phone distribution of filled pauses
(top10 most frequent).
Slight
] as well (around 3%).
A nasal preference is also evident : see [ɐ], [ɐm]
and [ m ].
Our point here is not to associate a meaning to the filler sounds. However, there is
strong empirical evidence that speakers use all of them for playing a structuring
role in the speech.
31. 31
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
PHONETIC FORM OF FILLED PAUSES
The two most common phonetic forms for
filled pauses:
the near-open central vowel [ɐ] ([6] in
SAMPA),
the mid-central vowel [ə] ([@] in SAMPA).
Phone distribution of filled pauses
(top10 most frequent).
Slight
] as well (around 3%).
A nasal preference is also evident : see [ɐ], [ɐm]
and [ m ].
The choice for a vocalic sound rather than other appears to be, at least in some
contexts, motivated by the behavior of neighbor phonetic segments, neutralizing in
some way the phonetic difference of the vocalic fillers.
32. 32
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
33. 33
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Annotation of patterns closely follows E. Shriberg methodology,
Encompasses the initial and final temporal marks,
Corresponding label contains the pattern and the orthographic transcription,
Repair-point marked temporally, showing the instant where the hesitation is
corrected and when the fluency on speech is recovered.
SEGMENTATION OF HESITATIONS
The period of time that corresponds to the beginning of the hesitation to its repair-
point is much larger (0.61 seconds in average) than the period of time between the
repair point and the end of the hesitation correction (0.34 seconds in average).
These trends concerning the distribution and duration of hesitation events may be
analyzed as manifestations of planning effort as well.
34. 34
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Scope
Goal
Description of the HESITA Database
Hesitation Patterns
Hesitations across speaking styles
Phonetic form of filled pauses
Segmentation of hesitations
Technical Information
Future
SUMMARY
35. 35
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Directories and files:
The archive that can be uploaded contains 58 audio files and the
corresponding TRS files, that enclose the two parts of the 30 daily
newsprograms.
Data structure of an entry:
The TRS files have a data type definition file associated: trans-14.dtd that
is provided in the archive.
TECHNICAL INFORMATION
Corpora size:
TRS files have a total of 4608 hesitation events.
The whole resource occupies 3GB, mainly due to the audio files.
36. 36
DiSS 2013
Stockholm, Sweden - August 21-23, 2013
Thank You
FUTURE...
We really expect that this database can be a relevant base of work for
further studies regarding a variety of speech phenomena.
37. The 6th Workshop on Disfluency in Spontaneous Speech
Stockholm, Sweden August 21-23, 2013
Sara Candeias 1
(saracandeias@co.it.pt)
Dirce Celorico 1
Jorge Proença 1
Arlindo Veiga 1,2
Fernando Perdigão 1,2
1Instituto de Telecomunicações, Coimbra, Portugal
2University of Coimbra, DEEC, Portugal
HESITA(tions) in Portuguese
Database
Editor's Notes
, which reveals a tendency in fluency also verified for other languages