Main Goal:
Improve automatic syntactic parsing of spontaneous spoken sentences using prosodic cues
Theoretical Motivation:
Automatic parsing is negatively affected by syntactic ambiguity (Kummerfeld et al., 2012)
Prosody can help resolving some syntactic ambiguities (Cutler et al., 1997)
Syntactic structure is related to prosodic structure (Selkirk, 1986, among many other studies)
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
This document describes a study that developed a generic transliteration tool to transliterate words from English to Hindi for cross-lingual information access applications. It discusses the challenges of transliteration from English to Hindi due to differences in phonetic properties between the two languages. The study then evaluates the performance of four existing transliteration editor tools - Xlit, GIST, Google Transliteration, and Microsoft Hindi writing tool - on test data including English words, human names, and other terms. Test results show variations in the transliterated output for the same input across the different tools.
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Normunds Grūzītis
We present a currently bilingual but potentially multilingual FrameNet-based grammar library implemented in Grammatical Framework. The contribution of this paper is two-fold. First, it offers a methodological approach to automatically generate the grammar based on semantico-syntactic valence patterns extracted from FrameNet- annotated corpora. Second, it provides a proof of concept for two use cases illustrating how the acquired multilingual grammar can be exploited in different CNL applications in the domains of arts and tourism.
PPT-CCL: A Universal Phrase Tagset for Multilingual TreebanksLifeng (Aaron) Han
Many syntactic treebanks and parser toolkits are developed in the past twenty years, including dependency structure parsers and phrase structure parsers. For the phrase structure parsers, they usually utilize different phrase tagsets for different languages, which results in an inconvenience when conducting the multilingual research. This paper designs a refined universal phrase tagset that contains 9 commonly used phrase categories. Furthermore, the mapping covers 25 constituent treebanks and 21 languages. The experiments show that the universal phrase tagset can generally reduce the costs in the parsing models and even improve the parsing accuracy.
This document is a lecture on tokenization and word counts in natural language processing. It discusses concepts like types and tokens, Zipf's law and Heap's law which relate the number of word types to the number of tokens in a text. The document also covers challenges in tokenization like sentence segmentation and provides examples of rule-based and machine learning approaches to tokenization. It introduces word normalization techniques like lemmatization and stemming and provides exercises for students to practice word counting, lemmatization, stemming and removing stop words from texts.
Annotated text corpora are an important resource for natural language processing research and technologies. Corpora can be annotated with linguistic information like parts of speech, morphology, syntax, and semantics through a layered approach. This involves manually or automatically tagging words, sentences, and texts with linguistic metadata. Well-annotated corpora are essential for tasks like morphological analysis, part-of-speech tagging, parsing, and machine translation model training.
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text prediction,
machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.
This thesis project will analyze interference from Spanish (L1) to English (L2) for the personal pronoun "it" among English language learners. The researcher, Lor gia Rueda de León Barbosa, will conduct a descriptive study to determine how factors like English proficiency level, exposure to English, and age influence misuse of the pronoun. The study aims to provide useful information for both students and teachers on common errors and how to explicitly teach pronoun functions to improve language acquisition.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
This document describes a study that developed a generic transliteration tool to transliterate words from English to Hindi for cross-lingual information access applications. It discusses the challenges of transliteration from English to Hindi due to differences in phonetic properties between the two languages. The study then evaluates the performance of four existing transliteration editor tools - Xlit, GIST, Google Transliteration, and Microsoft Hindi writing tool - on test data including English words, human names, and other terms. Test results show variations in the transliterated output for the same input across the different tools.
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Normunds Grūzītis
We present a currently bilingual but potentially multilingual FrameNet-based grammar library implemented in Grammatical Framework. The contribution of this paper is two-fold. First, it offers a methodological approach to automatically generate the grammar based on semantico-syntactic valence patterns extracted from FrameNet- annotated corpora. Second, it provides a proof of concept for two use cases illustrating how the acquired multilingual grammar can be exploited in different CNL applications in the domains of arts and tourism.
PPT-CCL: A Universal Phrase Tagset for Multilingual TreebanksLifeng (Aaron) Han
Many syntactic treebanks and parser toolkits are developed in the past twenty years, including dependency structure parsers and phrase structure parsers. For the phrase structure parsers, they usually utilize different phrase tagsets for different languages, which results in an inconvenience when conducting the multilingual research. This paper designs a refined universal phrase tagset that contains 9 commonly used phrase categories. Furthermore, the mapping covers 25 constituent treebanks and 21 languages. The experiments show that the universal phrase tagset can generally reduce the costs in the parsing models and even improve the parsing accuracy.
This document is a lecture on tokenization and word counts in natural language processing. It discusses concepts like types and tokens, Zipf's law and Heap's law which relate the number of word types to the number of tokens in a text. The document also covers challenges in tokenization like sentence segmentation and provides examples of rule-based and machine learning approaches to tokenization. It introduces word normalization techniques like lemmatization and stemming and provides exercises for students to practice word counting, lemmatization, stemming and removing stop words from texts.
Annotated text corpora are an important resource for natural language processing research and technologies. Corpora can be annotated with linguistic information like parts of speech, morphology, syntax, and semantics through a layered approach. This involves manually or automatically tagging words, sentences, and texts with linguistic metadata. Well-annotated corpora are essential for tasks like morphological analysis, part-of-speech tagging, parsing, and machine translation model training.
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text prediction,
machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.
This thesis project will analyze interference from Spanish (L1) to English (L2) for the personal pronoun "it" among English language learners. The researcher, Lor gia Rueda de León Barbosa, will conduct a descriptive study to determine how factors like English proficiency level, exposure to English, and age influence misuse of the pronoun. The study aims to provide useful information for both students and teachers on common errors and how to explicitly teach pronoun functions to improve language acquisition.
This document provides an introduction to prosodic morphology. It defines prosodic morphology as a theory that posits underlying morpheme representations as templates defined by prosodic units like feet and syllables. The principles of prosodic morphology state that morphological processes involving sound shape are defined by categories in the prosodic hierarchy. Examples are given of reduplication and truncation processes across languages that are explained by prosodic structure. The role of feet, minimal words, and how quantity sensitivity affects word structure are also discussed.
This document discusses using machine learning techniques like neural networks to help decipher ancient scripts and languages. It describes how character-level sequence-to-sequence models can be used to identify cognates between related languages. Additional techniques like network flows and dynamic programming are used to model monotonic character alignments and jointly segment and match tokens between known and unknown languages. The approaches are able to identify cognates between languages like Ugaritic and Hebrew as well as segment and match the unknown Iberian language. Neural models that incorporate linguistic features like phonological embeddings are shown to improve decipherment performance.
The document discusses several key topics in natural language processing and computational linguistics:
1. It defines the basic units of language like words, tokens, types and texts.
2. It describes techniques for extracting text from various sources like files, web pages and corpora and preprocessing the text by removing HTML tags and normalizing whitespace.
3. It discusses empirical observations about word frequencies like Zipf's Law and Heap's Law, which state that a small number of words occur very frequently while most words occur rarely.
Presented by Ted Xiao at RobotXSpace on 4/18/2017. This workshop covers the fundamentals of Natural Language Processing, crucial NLP approaches, and an overview of NLP in industry.
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAijnlc
Morphological segmentation is a fundamental task in language processing. Some languages, such as
Arabic and Tigrinya,have words packed with very rich morphological information.Therefore, unpacking
this information becomes a necessary taskfor many downstream natural language processing tasks. This
paper presents the first morphological segmentation research forTigrinya. We constructed a new
morphologically segmented corpus with 45,127 manually segmented tokens. Conditional random fields
(CRF) and window-based longshort-term memory (LSTM) neural networkswere employed separately to
develop our boundary detection models. We appliedlanguage-independent character and substring features
for the CRFand character embeddings for the LSTM networks. Experimentswere performed with four
variants of the Begin-Inside-Outside (BIO) chunk annotation scheme. We achieved 94.67% F1 scoreusing
bidirectional LSTMs with fixed-sizewindow approach to morphemeboundary detection.
BERT is a language representation model that was pre-trained using two unsupervised prediction tasks: masked language modeling and next sentence prediction. It uses a multi-layer bidirectional Transformer encoder based on the original Transformer architecture. BERT achieved state-of-the-art results on a wide range of natural language processing tasks including question answering and language inference. Extensive experiments showed that both pre-training tasks, as well as a large amount of pre-training data and steps, were important for BERT to achieve its strong performance.
Introduction to NLP with some practical exercises (tokenization, keyword extraction, topic modelling) using Python libraries like NLTK, Gensim and TextBlob, plus a general overview of the field.
1) The document outlines a study that examines whether working memory capacity is related to L2 reading and listening comprehension, and whether this relationship is mediated by proficiency level.
2) It presents background on working memory and its role in L1 and L2 language processing, then describes the study's research questions and methodology which involves administering working memory and proficiency tests to upper-intermediate and advanced English learners.
3) The study aims to provide insight into how working memory capacity and proficiency level interact in L2 reading and listening comprehension.
An Improved Approach to Word Sense DisambiguationSurabhi Verma
This document presents a knowledge-based algorithm for word sense disambiguation that uses WordNet. It computes the similarity between a target word and nearby words based on their intersection in WordNet hierarchies, distance between the words, and hierarchical level. The algorithm was evaluated on the SemCor corpus and performed better than existing supervised and unsupervised methods by frequently ranking the correct sense first or within the top three results.
The state-of-the-art Automatic Speech Recognition (ASR) systems lack the ability to identify spoken words if they have non-standard pronunciations. In this paper, we present a new classification algorithm to identify pronunciation variants. It uses Dynamic Phone Warping (DPW) technique to compute the
pronunciation-by-pronunciation phonetic distance and a threshold critical distance criterion for the classification. The proposed method consists of two steps; a training step to estimate a critical distance
parameter using transcribed data and in the second step, use this critical distance criterion to classify the input utterances into the pronunciation variants and OOV words.
The algorithm is implemented using Java language. The classifier is trained on data sets from TIMIT
speech corpus and CMU pronunciation dictionary. The confusion matrix and precision, recall and accuracy performance metrics are used for the performance evaluation. Experimental results show significant performance improvement over the existing classifiers.
This document provides an introduction to natural language processing (NLP). It discusses the brief history of NLP, major NLP tasks such as machine translation and text classification, common NLP techniques like part-of-speech tagging and parsing, main problems in NLP including ambiguity, and an overview of the topics to be covered in the course such as tokenization, parsing, and topic modeling. The course aims to use Python and R to complete various NLP tasks.
Hindi digits recognition system on speech data collected in different natural...csandit
This paper presents a baseline digits speech recognizer for Hindi language. The recording environment is different for all speakers, since the data is collected in their respective homes. The different environment refers to vehicle horn noises in some road facing rooms, internal background noises in some rooms like opening doors, silence in some rooms etc. All these recordings are used for training acoustic model. The Acoustic Model is trained on 8 speakers’ audio data. The vocabulary size of the recognizer is 10 words. HTK toolkit is used for building
acoustic model and evaluating the recognition rate of the recognizer. The efficiency of the recognizer developed on recorded data, is shown at the end of the paper and possible directions for future research work are suggested.
The document provides an overview of the Natural Language Toolkit (NLTK). It discusses that NLTK is a Python library for natural language processing that includes corpora, tokenizers, stemmers, part-of-speech taggers, parsers, and other tools. The document outlines the modules in NLTK and their functionality, such as the nltk.corpus module for corpora, nltk.tokenize and nltk.stem for tokenizers and stemmers, and nltk.tag for part-of-speech tagging. It also provides instructions on installing NLTK and downloading its data.
The document summarizes a presentation on a neural response study investigating how Taiwanese Mandarin speakers process two types of syllable gaps - tonal gaps (TGs) and segmental gaps (SGs). Two ERP experiments were conducted - a passive listening task and a lexical decision task. The results showed different ERP responses for TGs, SGs, and real syllables between the two tasks. The findings provide insights into how task demands can influence speech perception processing and the potential separate representations of tones and segments in the mental lexicon of Mandarin speakers.
This document summarizes a study on how the structural position of sounds affects their acquisition by English learners of Spanish. It tested if learners rely on distributional information when acquiring sounds. The study found that learners were most successful with sounds that have overlapping distributions in English and Spanish, and least successful with sounds only in Spanish. This suggests learners do use distribution to learn sounds and confirms the importance of comparing sound systems between languages.
What can typological knowledge bases and language representations tell us abo...Isabelle Augenstein
One of the core challenges in typology is to record properties of languages in a structured way. As a result of manual efforts, typological knowledge bases have emerged, which contains information about languages’ phonological, morphological and syntactic properties; as well as information about language families. Ideally, such typological knowledge bases would provide useful information for multilingual NLP models to learn how to selectively share parameters.
A related area of research suggests a different way of encoding properties of languages, namely to learn language representation vectors directly from text documents.
In this talk, I will analyse and contrast these two ways of encoding linguistic properties, as well as present research on how the two can benefit one another.
This document provides an overview of a research talk on human-in-the-loop speech synthesis technology given by Yuki Saito from the University of Tokyo. The talk was organized in two parts, with the first part presented by Saito covering human-in-the-loop deep speaker representation learning and speaker adaptation for multi-speaker text-to-speech. Saito's research group at the University of Tokyo works on text-to-speech and voice conversion using deep learning techniques. Their recent work focuses on incorporating human listeners into the training process to learn speaker representations that better capture perceptual speaker similarity.
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...ESEM 2014
Context: Real-time speech translation technology is today available but still lacks a complete understanding of how such technology may affect communication in global software projects. Goal: To investigate the adoption of combining speech recognition and machine translation in order to overcome language barriers among stakeholders who are remotely negotiating software requirements.
Method: We performed an empirical simulation-based study including: Google Web Speech API and Google Translate service, two groups of four subjects, speaking Italian and Brazilian Portuguese, and a test set of 60 technical and non-technical utterances.
Results: Our findings revealed that, overall: (i) a satisfactory accuracy in terms of speech recognition was achieved, although significantly affected by speaker and utterance differences; (ii) adequate translations tend to follow accurate transcripts, meaning that speech recognition is the most critical part for speech translation technology.
Conclusions: Results provide a positive albeit initial evidence towards the possibility to use speech translation technologies to help globally distributed team members to communicate in their native languages.
International Refereed Journal of Engineering and Science (IRJES)irjes
This document summarizes a study that analyzed the effect of prosody on the temporal realization of segments in Chinese. The study examined how prosodic word boundaries and prosodic phrase boundaries impact the voice onset time (VOT) of consonants and duration of vowels. Key findings include: 1) Vowels preceding prosodic phrase boundaries were longer than those preceding prosodic word boundaries; 2) Place of articulation of the second consonant also impacted vowel duration; 3) VOT of initial consonants was affected by prosody but not place of articulation; 4) VOT of final consonants was impacted by place of articulation but not prosody. The results demonstrate the interaction between prosodic structure and segmental temporal realization
This document summarizes a study that investigated how modifying the duration of acoustic cues in fricative consonants affects perception of voicing and place of articulation. The study synthesized fricative-vowel syllables with selective time expansions of fricative noise duration and vowel formant transition duration. Listeners then identified voicing and place of articulation in the syllables in quiet and noise conditions. Results showed that lengthening formant transitions significantly improved place of articulation identification, while lengthening noise duration had little effect on voicing or place cues. The study aimed to determine how clear speech production features like expanded durations can enhance perception of fricatives.
This document provides an introduction to prosodic morphology. It defines prosodic morphology as a theory that posits underlying morpheme representations as templates defined by prosodic units like feet and syllables. The principles of prosodic morphology state that morphological processes involving sound shape are defined by categories in the prosodic hierarchy. Examples are given of reduplication and truncation processes across languages that are explained by prosodic structure. The role of feet, minimal words, and how quantity sensitivity affects word structure are also discussed.
This document discusses using machine learning techniques like neural networks to help decipher ancient scripts and languages. It describes how character-level sequence-to-sequence models can be used to identify cognates between related languages. Additional techniques like network flows and dynamic programming are used to model monotonic character alignments and jointly segment and match tokens between known and unknown languages. The approaches are able to identify cognates between languages like Ugaritic and Hebrew as well as segment and match the unknown Iberian language. Neural models that incorporate linguistic features like phonological embeddings are shown to improve decipherment performance.
The document discusses several key topics in natural language processing and computational linguistics:
1. It defines the basic units of language like words, tokens, types and texts.
2. It describes techniques for extracting text from various sources like files, web pages and corpora and preprocessing the text by removing HTML tags and normalizing whitespace.
3. It discusses empirical observations about word frequencies like Zipf's Law and Heap's Law, which state that a small number of words occur very frequently while most words occur rarely.
Presented by Ted Xiao at RobotXSpace on 4/18/2017. This workshop covers the fundamentals of Natural Language Processing, crucial NLP approaches, and an overview of NLP in industry.
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAijnlc
Morphological segmentation is a fundamental task in language processing. Some languages, such as
Arabic and Tigrinya,have words packed with very rich morphological information.Therefore, unpacking
this information becomes a necessary taskfor many downstream natural language processing tasks. This
paper presents the first morphological segmentation research forTigrinya. We constructed a new
morphologically segmented corpus with 45,127 manually segmented tokens. Conditional random fields
(CRF) and window-based longshort-term memory (LSTM) neural networkswere employed separately to
develop our boundary detection models. We appliedlanguage-independent character and substring features
for the CRFand character embeddings for the LSTM networks. Experimentswere performed with four
variants of the Begin-Inside-Outside (BIO) chunk annotation scheme. We achieved 94.67% F1 scoreusing
bidirectional LSTMs with fixed-sizewindow approach to morphemeboundary detection.
BERT is a language representation model that was pre-trained using two unsupervised prediction tasks: masked language modeling and next sentence prediction. It uses a multi-layer bidirectional Transformer encoder based on the original Transformer architecture. BERT achieved state-of-the-art results on a wide range of natural language processing tasks including question answering and language inference. Extensive experiments showed that both pre-training tasks, as well as a large amount of pre-training data and steps, were important for BERT to achieve its strong performance.
Introduction to NLP with some practical exercises (tokenization, keyword extraction, topic modelling) using Python libraries like NLTK, Gensim and TextBlob, plus a general overview of the field.
1) The document outlines a study that examines whether working memory capacity is related to L2 reading and listening comprehension, and whether this relationship is mediated by proficiency level.
2) It presents background on working memory and its role in L1 and L2 language processing, then describes the study's research questions and methodology which involves administering working memory and proficiency tests to upper-intermediate and advanced English learners.
3) The study aims to provide insight into how working memory capacity and proficiency level interact in L2 reading and listening comprehension.
An Improved Approach to Word Sense DisambiguationSurabhi Verma
This document presents a knowledge-based algorithm for word sense disambiguation that uses WordNet. It computes the similarity between a target word and nearby words based on their intersection in WordNet hierarchies, distance between the words, and hierarchical level. The algorithm was evaluated on the SemCor corpus and performed better than existing supervised and unsupervised methods by frequently ranking the correct sense first or within the top three results.
The state-of-the-art Automatic Speech Recognition (ASR) systems lack the ability to identify spoken words if they have non-standard pronunciations. In this paper, we present a new classification algorithm to identify pronunciation variants. It uses Dynamic Phone Warping (DPW) technique to compute the
pronunciation-by-pronunciation phonetic distance and a threshold critical distance criterion for the classification. The proposed method consists of two steps; a training step to estimate a critical distance
parameter using transcribed data and in the second step, use this critical distance criterion to classify the input utterances into the pronunciation variants and OOV words.
The algorithm is implemented using Java language. The classifier is trained on data sets from TIMIT
speech corpus and CMU pronunciation dictionary. The confusion matrix and precision, recall and accuracy performance metrics are used for the performance evaluation. Experimental results show significant performance improvement over the existing classifiers.
This document provides an introduction to natural language processing (NLP). It discusses the brief history of NLP, major NLP tasks such as machine translation and text classification, common NLP techniques like part-of-speech tagging and parsing, main problems in NLP including ambiguity, and an overview of the topics to be covered in the course such as tokenization, parsing, and topic modeling. The course aims to use Python and R to complete various NLP tasks.
Hindi digits recognition system on speech data collected in different natural...csandit
This paper presents a baseline digits speech recognizer for Hindi language. The recording environment is different for all speakers, since the data is collected in their respective homes. The different environment refers to vehicle horn noises in some road facing rooms, internal background noises in some rooms like opening doors, silence in some rooms etc. All these recordings are used for training acoustic model. The Acoustic Model is trained on 8 speakers’ audio data. The vocabulary size of the recognizer is 10 words. HTK toolkit is used for building
acoustic model and evaluating the recognition rate of the recognizer. The efficiency of the recognizer developed on recorded data, is shown at the end of the paper and possible directions for future research work are suggested.
The document provides an overview of the Natural Language Toolkit (NLTK). It discusses that NLTK is a Python library for natural language processing that includes corpora, tokenizers, stemmers, part-of-speech taggers, parsers, and other tools. The document outlines the modules in NLTK and their functionality, such as the nltk.corpus module for corpora, nltk.tokenize and nltk.stem for tokenizers and stemmers, and nltk.tag for part-of-speech tagging. It also provides instructions on installing NLTK and downloading its data.
The document summarizes a presentation on a neural response study investigating how Taiwanese Mandarin speakers process two types of syllable gaps - tonal gaps (TGs) and segmental gaps (SGs). Two ERP experiments were conducted - a passive listening task and a lexical decision task. The results showed different ERP responses for TGs, SGs, and real syllables between the two tasks. The findings provide insights into how task demands can influence speech perception processing and the potential separate representations of tones and segments in the mental lexicon of Mandarin speakers.
This document summarizes a study on how the structural position of sounds affects their acquisition by English learners of Spanish. It tested if learners rely on distributional information when acquiring sounds. The study found that learners were most successful with sounds that have overlapping distributions in English and Spanish, and least successful with sounds only in Spanish. This suggests learners do use distribution to learn sounds and confirms the importance of comparing sound systems between languages.
What can typological knowledge bases and language representations tell us abo...Isabelle Augenstein
One of the core challenges in typology is to record properties of languages in a structured way. As a result of manual efforts, typological knowledge bases have emerged, which contains information about languages’ phonological, morphological and syntactic properties; as well as information about language families. Ideally, such typological knowledge bases would provide useful information for multilingual NLP models to learn how to selectively share parameters.
A related area of research suggests a different way of encoding properties of languages, namely to learn language representation vectors directly from text documents.
In this talk, I will analyse and contrast these two ways of encoding linguistic properties, as well as present research on how the two can benefit one another.
This document provides an overview of a research talk on human-in-the-loop speech synthesis technology given by Yuki Saito from the University of Tokyo. The talk was organized in two parts, with the first part presented by Saito covering human-in-the-loop deep speaker representation learning and speaker adaptation for multi-speaker text-to-speech. Saito's research group at the University of Tokyo works on text-to-speech and voice conversion using deep learning techniques. Their recent work focuses on incorporating human listeners into the training process to learn speaker representations that better capture perceptual speaker similarity.
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...ESEM 2014
Context: Real-time speech translation technology is today available but still lacks a complete understanding of how such technology may affect communication in global software projects. Goal: To investigate the adoption of combining speech recognition and machine translation in order to overcome language barriers among stakeholders who are remotely negotiating software requirements.
Method: We performed an empirical simulation-based study including: Google Web Speech API and Google Translate service, two groups of four subjects, speaking Italian and Brazilian Portuguese, and a test set of 60 technical and non-technical utterances.
Results: Our findings revealed that, overall: (i) a satisfactory accuracy in terms of speech recognition was achieved, although significantly affected by speaker and utterance differences; (ii) adequate translations tend to follow accurate transcripts, meaning that speech recognition is the most critical part for speech translation technology.
Conclusions: Results provide a positive albeit initial evidence towards the possibility to use speech translation technologies to help globally distributed team members to communicate in their native languages.
International Refereed Journal of Engineering and Science (IRJES)irjes
This document summarizes a study that analyzed the effect of prosody on the temporal realization of segments in Chinese. The study examined how prosodic word boundaries and prosodic phrase boundaries impact the voice onset time (VOT) of consonants and duration of vowels. Key findings include: 1) Vowels preceding prosodic phrase boundaries were longer than those preceding prosodic word boundaries; 2) Place of articulation of the second consonant also impacted vowel duration; 3) VOT of initial consonants was affected by prosody but not place of articulation; 4) VOT of final consonants was impacted by place of articulation but not prosody. The results demonstrate the interaction between prosodic structure and segmental temporal realization
This document summarizes a study that investigated how modifying the duration of acoustic cues in fricative consonants affects perception of voicing and place of articulation. The study synthesized fricative-vowel syllables with selective time expansions of fricative noise duration and vowel formant transition duration. Listeners then identified voicing and place of articulation in the syllables in quiet and noise conditions. Results showed that lengthening formant transitions significantly improved place of articulation identification, while lengthening noise duration had little effect on voicing or place cues. The study aimed to determine how clear speech production features like expanded durations can enhance perception of fricatives.
This document describes a study on voice conversion using sequence-to-sequence learning. The researchers propose converting context posterior probabilities from the source to target speaker using sequence-to-sequence learning to allow for variable-length conversion. They also propose jointly training the recognition and synthesis models to better relate recognition accuracy to synthesis accuracy. Experimental results found that sequence-to-sequence learning enabled variable-length conversion and joint training improved speaker similarity and quality of converted speech over conventional methods.
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningCITE
HU, Yuxiu (Harbin Institute of Technology Shenzhen Graduate School, China)
BODOMO, Adams (The University of Hong Kong)
http://citers2013.cite.hku.hk/en/paper_603.htm
---------------------------
Author(s) bear(s) the responsibility in case of any infringement of the Intellectual Property Rights of third parties.
---------------------------
CITE was notified by the author(s) that if the presentation slides contain any personal particulars, records and personal data (as defined in the Personal Data (Privacy) Ordinance) such as names, email addresses, photos of students, etc, the author(s) have/has obtained the corresponding person's consent.
Development of text to speech system for yoruba languageAlexander Decker
This document describes the development of a text-to-speech (TTS) system for the Yoruba language. It begins with background on TTS systems and an overview of previous work developing TTS for other languages but not extensively for Yoruba. The authors then describe the architecture and design of the Yoruba TTS system they developed using a concatenative synthesis method. This includes analyzing the phonology and syllable structure of Yoruba, and developing components for syllable identification, prosody assignment, and speech signal processing. An evaluation of the system found 70% of respondents found it usable.
Louise Stringer and Paul Iverson from UCL investigated how accent influences word recognition and electrophysiological measures of speech processing for native English and Spanish listeners. They found that a regional Scottish accent and non-native Spanish accent showed some influence on early phonological and lexical processing even in quiet conditions. More intelligible accents in noise elicited larger brain responses, suggesting processing difficulties with accented speech occur even without noise. Accents may affect listeners' expectations about upcoming words.
Predictability of Consonant Perception Ability Through a Listening Comprehens...Kosuke Sugai
This study examined whether a typical English listening comprehension test can predict learners' ability to perceive English consonants. The researchers administered a 30-item listening comprehension test to 107 Japanese EFL learners and selected 22 learners who scored between 25-27. These learners then completed a phoneme judgment task with 17 minimal word pairs differing in initial consonants. The results showed the listening test did not predict learners' consonant perception abilities and that learners with similar listening scores varied in their overall and individual consonant perception skills. The study supports the idea that common listening tests do not measure phonetic abilities.
This study investigated how the brain integrates linguistic and perceptual information during language comprehension using electro- and magnetoencephalography. The researchers found:
1) Linguistically complex words (inflected verbs) engaged a left-lateralized network including temporal and frontal regions, whereas perceptually complex words activated a bilateral network.
2) Functional connectivity analysis revealed partially overlapping neural networks supporting linguistic and perceptual processing, with both enhancing connections between left temporal regions and bilateral frontal regions.
3) Connectivity between left temporal and frontal regions specifically increased for linguistically complex words, suggesting their role in morphosyntactic computations during language comprehension.
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
This work presents an application of Fundamental Frequency (Pitch), Linear Predictive Cepstral Coefficient
(LPCC) and Mel Frequency Cepstral Coefficient (MFCC) in identification of sex of the speaker in speech
recognition research. The aim of this article is to compare the performance of these three methods for
identification of sex of the speakers. A successful speech recognition system can help in non critical operations
such as presenting the driving route to the driver, dialing a phone number, light switch turn on/off, the coffee
machine on/off etc. apart from speaker verification-caste wise, community wise and locality wise including
identification of sex. Here an attempt has been made to identify the sex of Bodo speakers through vowel
utterance by following Pitch value, LPCC and MFCC techniques. It is found here that the feature vector
organization of LPCC coefficients provides a more promising way of speech-speaker recognition in case of
Bodo Language than that of Pitch and MFCC.
Portrait poster on
"Text matters but speech influences: A computational analysis of syntactic ambiguity resolution"
in CogSci 2020
Paper available at:
https://cognitivesciencesociety.org/cogsci20/papers/0448/index.html
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and fluidity are better.
Keywords:TTS, SBS, Sillable, Diphone.
This document provides an overview of a course on methods and algorithms for speech recognition. The 10-week course covers topics like speech production acoustics, time/frequency representation using digital filters, linear predictive modeling, speech coding, phonetics, speech synthesis, and speech recognition. It requires 3 practical homework assignments and a final assessment. References for further reading on speech processing topics are also provided.
江振宇/It's Not What You Say: It's How You Say It!台灣資料科學年會
This document discusses prosody modeling for Mandarin Chinese speech. It begins with an introduction to prosody and its importance in communication. Prosody can be measured acoustically using features like fundamental frequency, duration, intensity, and pause. A prosodic hierarchy for Mandarin is proposed with different levels like syllable, prosodic word, phrase, and breath group. Unsupervised joint prosody labeling and modeling is introduced as an approach that models observed prosodic features to determine prosodic tags without human perception. Parameters and a hierarchical model are used to represent prosodic structures and model relationships between linguistic information and prosodic-acoustic features.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
Won Ik Cho presented on his research related to intention understanding in Korean natural language processing. He discussed developing annotation guidelines and corpora to classify Korean utterances by speech act, considering factors like intonation, context, and rhetoricalness. He proposed a method using text-based analysis combined with speech-aided disambiguation. Future work includes developing structured paraphrasing for argument extraction and an improved dialog manager.
Similar to COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IMPROVE PARSING (20)
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
WeTestAthens: Postman's AI & Automation Techniques
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IMPROVE PARSING
1. Computational Approaches to the
Syntax-Prosody Interface: Using
Prosody to Improve Parsing
Dissertation Defense
Hussein Ghaly
December 12th, 2019
2. Goal and Motivation
Main Goal:
Improve automatic syntactic parsing of spontaneous spoken
sentences using prosodic cues
Theoretical Motivation:
● Automatic parsing is negatively affected by syntactic ambiguity (Kummerfeld
et al., 2012)
● Prosody can help resolving some syntactic ambiguities (Cutler et al., 1997)
● Syntactic structure is related to prosodic structure (Selkirk, 1986, among
many other studies)
3. Challenges and Opportunities
Challenges
- Lack of congruence between syntactic and prosodic structures
- Lack of interdisciplinary engagement in prosody research between computational
linguistics and other branches of linguistics
Opportunities
- Availability of parsing frameworks
- Availability of ToBI annotation
- Availability of speech corpora (e.g. the Switchboard Corpus)
- Interest in Natural Language Understanding for speech
4. What is prosody
● “(1) acoustic patterns of F0, duration, amplitude, spectral tilt, and segmental
reduction, and their articulatory correlates, that can be best accounted for by
reference to higher-level structures, and (2) the higher-level structures that
best account for these patterns.” (Shattuck-Hufnagel and Turk, 1996)
● Includes a number of speech phenomena, including: Prosodic phrasing,
Stress, Intonation, Rhythm
● Autosegmental-Metrical Theory (Ladd, 2008) was proposed to organize these
components together
5. Prosodic Structure is a Hierarchy of Constituents
● all languages have
hierarchically ordered
prosodic structure
● languages make use of
the same set of
prosodic categories
(Elfner, 2018) Illustration of prosodic hierarchy, from Elfner (2018)
6. Prosodic structure is much flatter than syntactic structure
As a matter of fact that’s what I’m doing
⍵ ⍵ ⍵ ⍵ ⍵ ⍵ ⍵ ⍵ ⍵
iP iP
IP
Prosodic word
Intermediate phrase/Phonological Phrase
Intonational phrase
Prosodic Structure
Syntactic Structure
7. Prosody is influenced by syntax and other factors
Syntax
● Suci (1967): non-syntactically structured word
lists have more prosodic variation than
syntactically structured sentences
● Prosody can resolve some syntactic
ambiguities
● Some syntactic structures (e.g. parentheticals
“John, said Mary, was nice”, and tag
questions “She’s Italian, isn’t she?”)
● clause boundaries (e.g. when John left, I
cried)
Other factors
● prosodic grouping can be
different from syntactic
grouping
○ syntax follows the grouping
S (V O), while prosody
follows the grouping (S V) O
(Martin, 1970)
● speech rate
● utterance length and
constituent length
● semantic and pragmatic
factors
8. A theoretical model depicts factors affecting prosody
Model for factors influencing prosody
(from Turk and Shattuck-Hufnagel,
2014)
● prosodic prosodic structure as a
theoretical construct,
representing the convergence of
all these factors
● Constituent length can be added
to utterance length factors
9. Syntax-Prosody Interface - some phonological theories
● Indirect Reference: phonological processes apply to prosodic domains
(constituents), which are related to syntactic constituents
○ Selkirk (1986) Align-XP: syntactic constituents share one edge with prosodic constituents
(Align-R or Align-L depending on the language)
○ Truckenbrodt (1995, 1999) Wrap-XP: a constraint that demands that each syntactic phrase is
contained within a phonological phrase
○ Match Theory (Selkirk 2006, 2009, Elfner 2012, Myrberg 2013): syntactic clauses map to
Intonational phrases, syntactic phrases map to phonological phrases, and morphosyntactic
words map to prosodic words
10. ToBI is a system for annotating prosody
ToBI: Tones and Break Indexes
A system for annotating prosodic
information (Silverman et al., 1992)
Based on theories of prosodic
structure by Beckman and
Pierrehumbert (1986)
Break indexes (0-4) reflect
disjuncture levels between words
from (Veilleux et al., 2006)
11. Part 1 - The Effect of Syntactic Phrase
Length on Prosody
12. Phrase length affects prosody of double center embedded sentences in English
Double Center Embedded sentences (From Fodor and Nickels, 2011):
- Encouraging Phrase length (ENC) (short inner phrases) split into 3 chunks
- Discouraging Phrase length (DISC) (long inner phrases) split into 4+ chunks
NP1 NP2 NP3 VP1 VP2 VP3
the rusty old ceiling pipes that the plumber my dad trained fixed continue to leak occasionally
NP1 NP2 NP3 VP1 VP2 VP3
the pipes that the unlicensed plumber the new janitor reluctantly assisted tried to repair burst
13. No difference in prosody due to phrase length was found in French
Desroses (2014) manually examined the frequency of pauses (silence intervals
>=250 ms) at the edges of syntactic constituents, didn’t find difference between
the two sentence types
ENC: Le joli ballon jaune vif (1) que l'enfant (2) que le maître (3) punit (4) lâcha (5)
est vraiment coincé dans l'arbre.
DISC: Le ballon (1) que le jeune enfant (2) que le maître d'école (3) punit très
souvent (4) lâcha bêtement (5) est jaune.
Before NP2 (location
1)
After NP2 (location 2) Before VP2 (location 4) After VP2 (location 5)
ENC 6.7 % 14.07 % 22.6 % 28.15 %
DISC 8.5 % 10.4 % 19.63 % 27.04 %
14. Data reanalyzed using judge annotation and forced alignment
Re-analyzing recordings collected by Desroses (271 ENC and 272 DISC
recordings) to identify the prosodic boundaries at the edges of syntactic phrases by:
● Obtaining judgments by two native speakers of French, of where they perceive
the prosodic boundaries, in a subset of recordings (48 recordings)
● Using forced alignment (automatically mapping each word to its corresponding
portion of the audio file), to obtain silent pause durations between words (397
recordings)
15. Sentences were presented to judges for annotation
An example sentence of the set presented to judges
16. Forced alignment indicated pauses between words
● Montreal Forced Alignment (McAuliffe et al., 2017) was used
● edges of syntactic phrases are identified manually in a copy of the sentence,
for example:
○ Le ballon <1> que le jeune enfant <2> que le maître d'école <3> punit très souvent <4> lâcha
bêtement <5> est jaune.
● Words in the forced alignment with their start and end times are mapped to
those in the copy
● Pause values are calculated at the five locations
17. Judges indicated difference in average number of breaks for each sentence type
- average number of prosodic boundaries over all five syntactic boundaries for each judge (48 recordings)
- first judge: ENC: 2.43; DISC: 2.92. second judge: ENC: 2.5; DISC: 3.2.
18. Forced Alignment indicated more pauses before VP2 for DISC sentences
● at location 4, percentage of ENC sentences with a pause value of 250 ms or greater
was 14%, while percentage for DISC sentences was 19.1% (397 recordings)
● Average pause duration at location 4: 105 ms (ENC), 154 ms (DISC)
○ After excluding recordings with pauses > 1 second: 80 ms (ENC, 189 recordings), 110 ms (DISC, 202
recordings), p-value .056
19. Part 2 - Resolving Syntactic Ambiguities
Using Prosody
20. Goal: examine whether it is possible to identify the syntactic attachment using
prosody, both by human listeners and by computers
I saw the boy with the telescope
Can prosody resolve syntactic ambiguity?
21. An experiment for production and perception of sentences with ambiguities:
● Comma ambiguity:
○ John, said Mary, was the nicest person at the party.
○ John said Mary was the nicest person at the party.
● PP-attachment ambiguity:
○ I have a new telescope. I saw the boy with the telescope.
○ One of the boys got a telescope. I saw the boy with the telescope.
Comma ambiguity and PP-attachment ambiguity are investigated
22. Ambiguous sentences were recorded by speakers and presented to listeners
Using crowdsourcing, through Amazon Mechanical Turk (MTurk), workers were
recruited for the production and perception experiment
Production Experiment
● Ambiguous sentences were recorded by a number of naive native speakers
● Recordings by 6 speakers, with the clearest recordings, were selected
Perception Experiment
● Ambiguous sentences were presented to naive human participants both in
audio and text formats, to answer comprehension questions
● Experiment was organized in phases, where in each phase questions were
based on the recordings of only one of the speakers
23. Listeners answer questions about
ambiguous sentences
Question Types
1. Comma-ambiguity - Text
2. PP-attachment ambiguity with context - Text
3. PP-attachment ambiguity without context - Text
4. Comma-ambiguity - Audio
5. PP-attachment ambiguity with context - Audio
6. (and 7) PP-attachment ambiguity without context - Audio (two
different sentences of this question type)
Example: https://champolu.net/mturk/listen.html?abcd
24. ● Participants disambiguation
accuracy (text: 49% audio
63%, p-value < .001,
independent t-test)
● Results are after excluding
sentences not understood
properly even with context,
and listeners with overall low
comprehension accuracy
PP-attachment ambiguity is more accurately resolved in audio than in text
Question Type Accuracy
comma ambiguity - text 99%
comma ambiguity - audio 92%
PP-attachment ambiguity - text - with context 97%
PP-attachment ambiguity - audio - with context 98%
PP-attachment ambiguity - text - no context 49%
PP-attachment ambiguity - audio - no context 63%
25. Larger pauses yield better accuracy for high attachment sentences
● higher values of pauses and
normalized duration of the last
NP word lead to higher accuracy
of disambiguation of sentences
with high attachment by listeners
Normalized duration: actual duration of the
word divided by expected duration, where
expected duration is the sum of average
duration of each phoneme for each speaker
high attachment low attachment
Pause (ms)
Number of
Recordings
average
accuracy
Number of
Recordings
average
accuracy
0 73 35.28% 103 82.26%
10 11 41.41% 11 76.73%
20+ 34 66.93% 4 57.29%
high attachment low attachment
Normalized
Duration
Number of
Recordings
average
accuracy
Number of
Recordings
average
accuracy
<1.0 10 22.08% 32 84.84%
1 17 28.53% 39 77.16%
1.1 18 52.03% 23 85.44%
1.2 22 44.29% 15 79.63%
>1.2 51 52.74% 7 19.72%
26. ● Based on the data just presented, a machine learning system (decision trees)
used pauses and duration values as features to predict the attachment.
● System accuracy ranged from 63% to 73%, based on how the data are split
into training and test portions (e.g. system performs better when training and
testing on different portions of the recordings of the same speaker)
Machine Learning predicts attachment of recorded sentences
Speaker ID shuffled all
intra-speaker
classification
odd speaker
out
Odd sentence
out
Odd recording
out
Listener
Accuray
my64 75.0% 50.0% 67.5% 66.3% 60.0% 64.0%
wdn 62.5% 67.5% 55.0% 53.8% 70.0% 54.0%
ds 70.0% 90.0% 70.0% 66.3% 85.0% 83.0%
dz 57.5% 60.0% 52.5% 57.5% 65.0% 56.0%
mm 70.0% 87.5% 70.0% 70.0% 85.0% 52.0%
tk 69.4% 83.2% 61.1% 65.3% 77.5% 69.0%
Average 67.4% 73.0% 62.7% 63.2% 73.8% 63.0%
27. ● Corpus analysis was conducted for the syntactic and prosodic data in the
ToBI annotated subset from the Switchboard corpus (SWB) (Godfrey et al.,
1992) from different speakers (150 speakers)
● The focus was on PP-attachment ambiguity and relative clause attachment
(RC-attachment) ambiguity
● An algorithm was developed to identify instances of such ambiguities in the
syntactic data:
○ PP-attachment: instances of a noun phrase (NP) immediately followed by a prepositional
phrase (PP)
○ RC-attachment: instances of NP immediately followed by a relative clause (SBAR)
○ Low attachment is when there is a large NP spanning both constituents (NP + PP or NP +
SBAR), otherwise high attachment
Does attachment affect prosody in spontaneous sentences?
28. Examples of sentences with PP-attachment identified by the algorithm
Low attachment High attachment
29. Examples of sentences with RC-attachment identified by the algorithm
Low attachment High attachment
30. Attachment affects the distribution of prosodic breaks
● At the end of NP (before PP or SBAR), identify ToBI break index (from the
Switchboard corpus)
● Effect of RC-attachment is much stronger than PP-attachment
PP-attachment RC-attachment
31. Phrase Length also affects the distribution of prosodic breaks
● Consistent with Shafran and Fodor (2016), Watson and Gibson (2004), phrase
length affects likelihood of prosodic breaks
● More than 75% instances of high attachment with ToBI 1 have short NPs (<3 words)
● 50% of instances of low attachment with ToBI 3,4 have longer PPs (4+ words)
ToBI 0 1 2 3 4 Grand Total
attachment Count (%) Count (%) Count (%) Count (%) Count (%) Count (%)
high 27 0.96% 842 30.04% 44 1.57% 145 5.17% 212 7.56% 1270 45.31%
low 166 5.92% 1139 40.64% 43 1.53% 89 3.18% 96 3.42% 1533 54.69%
Grand Total 193 6.89% 1981 70.67% 87 3.10% 234 8.35% 308 10.99% 2803 100.00%
NP phrase length:
High attachment
instances
ToBI 1
PP phrase length:
Low attachment
instances,
ToBI 3 and 4
ToBI 1
NP length Count (%)
1 word 425 15.16%
2 words 228 8.13%
3 words 101 3.60%
4+ words 88 3.14%
Total 842 30.04%
ToBI 3 4
PP length Count (%) Count (%)
1 word 1 0.04%
2 words 21 0.75% 16 0.57%
3 words 23 0.82% 32 1.14%
4+ words 44 1.57% 48 1.71%
89 3.18% 96 3.42%
32. Can we predict attachment from phrase length and prosody?
● 2803 instances of PP-
attachment:
○ 1270 high attachment (45%)
○ 1533 low attachment (55%)
● 1559 instances of RC-
attachment:
○ 739 high attachment (47%)
○ 820 low attachment (53%)
Features Labels
ambiguity sentence ID
NP size
(words)
PP/SBAR
size (words)
ToBI Break
Index low attachment
ppa sw4890.A-s89 1 3 1 FALSE
ppa sw4890.B-s72 2 3 4 FALSE
ppa sw2018.A-s144 1 2 1 TRUE
ppa sw2018.A-s145 1 2 1 TRUE
ppa sw2018.A-s157 1 2 3 TRUE
rca sw4890.B-s72 1 4 3 FALSE
rca sw4890.B-s73 3 4 4 FALSE
rca sw4890.B-s8 1 4 4 FALSE
rca sw2018.A-s131 3 4 1 TRUE
rca sw2018.B-s163 3 4 4 TRUE
Sample of data compiled
33. ● Using Machine Learning
(decision trees), with different
feature combinations
● PP-attachment prediction
using prosody only is
statistically significant (p-
value: < .001 using
independent T-test)
● an improvement when
combined with phrase length,
but not statistically significant
(p-value .078)
Machine Learning Predicts attachment based on prosody and phrase length
Set Description Features Accuracy (%)
RC- Attachment instances
ToBI 69.02
Length of NP 62.80
Length of NP, ToBI 71.14
Length of NP, Length of SBAR 64.85
Length of NP, Length of SBAR, ToBI 71.20
Length of SBAR 57.79
Length of SBAR, ToBI 69.60
PP-Attachment instances
ToBI 60.54
Length of NP 60.93
Length of NP, ToBI 63.47
Length of NP, Length of PP 61.04
Length of NP, Length of PP, ToBI 63.40
Length of PP 55.19
Length of PP, ToBI 60.86
35. Goal:
build a computational system using prosody to improve parsing of
spontaneous sentences in the Switchboard Corpus
Motivation:
Previous computational approaches (e.g. Kahn et al., (2005), Huang and Harper
(2010), Tran et al., (2017)) attempted to do so. This work will proceed in this
direction informed with the theoretical foundation of syntax-prosody relationship,
mainly semantic coherence
Can prosody be used to improve parsing?
36. Hypothesis: Syntax-Prosody correspondences improve parsing
Hypothesis 1- There are elements of correspondence between prosody and
syntax that can be extracted from the syntactic structure
Hypothesis 2- using these correspondences, along with prosodic
information, we can select the most appropriate parse for an utterance
37. Parsing is identifying the structure of a sentence
● Constituency parsing: a hierarchy of syntactic constituents
● Dependency parsing: dependent-head relationships
○ Main metric: Unlabeled Attachment Score (UAS), percentage of
heads identified correctly
● Dependency parsing is now the norm in computational
linguistics:
○ Faster, scalable to new languages, represents semantic relationships
○ Provides same information as constituency plus head information
● Dependency structure has not been used much in prosody
research
○ Exception: Pate and Goldwater, 2014
Constituency
Dependency
38. Semantic coherence affects likelihood of prosodic breaks
● Selkirk (1984): distribution of intonational phrase boundaries can be accounted
for by a semantic constraint called the Sense Unit Condition (SUC):
○ The immediate constituents of an intonational phrase must be semantically related
■ a. John gave the book // to Mary.
■ b. * John gave // the book to Mary.
■ c. John gave // the book // to Mary (examples from Watson and Gibson, 2004)
● Ferreira (1988) and Watson and Gibson (2004): developed algorithms for
predicting likelihood of prosodic breaks, predicting higher likelihood when there
is no dependency and semantic coherence between words
39. Dependency configurations correspond to semantic coherence
● The concept of “dependency configurations” is proposed here to quantify
semantic coherence between adjacent words, based on dependency
structure
● It is defined in terms of dependency offsets: the distance (measured by
number of words) between a word and its head
● For each word, the offset is quantified as:
○ 0 if the word is root
○ +1 if it depends on the word immediately to the right
○ +2 if it depends on a word further to the right
○ -1 if it depends on the word immediately to the left
○ -2 if it depends on a word further to the left
● Each pair of consecutive words is characterized by a duple representation
(e.g. (+1,-2)) to describe the configurations
40. There are 12 different observed dependency configurations
Examples from the Switchboard Corpus, converted to dependency structure by
Honnibal and Johnson (2014)
41. Dependency configurations correspond to prosodic breaks
● Configuration (-1, +1) account for 35% of
ToBI 4 and 26% of ToBI 3
● Configurations (+2,+1) and (-1,-2) combined
account for 41% of ToBI 4 and 38% of ToBI 3
● If there is a direct dependency between two
consecutive words, there is a smaller
likelihood of prosodic breaks between them
ToBI Values Grand
Totalconfiguration 1 2 3 4
(+2, +1) 15657 1550 602 860 18669
(+1, -2) 10328 402 203 135 11068
(-1, +1) 6125 557 726 1456 8864
(-1, -1) 6062 281 308 358 7009
(+1, 0) 6032 233 114 94 6473
(-1, -2) 3268 334 465 829 4896
(+1, +1) 3308 102 42 30 3482
(0, -1) 2802 130 120 115 3167
(0, +1) 2645 130 174 154 3103
(+2, -1) 1011 25 31 22 1089
(-1, 0) 117 11 27 62 217
(0, 0) 74 30 2 7 113
Grand Total 57429 3785 2814 4122 68150
42. Features are extracted from parse hypotheses and prosodic information
Lexical Syntactic Prosodic
word head word POS Config normalized duration pause after
so know RB (+2,+1) 3.23 0.31
i know PRP (+1,0) 0.45 0
know - VBP (0,+1) 1.13 0
what said WP (+2,+1) 0.97 0
they said PRP (+2,+1) 1.62 0
‘ve said VBP (+1,-2) 0.71 0
said know VBN N/A 2.03 0
44. Recurrent Neural Networks offer a lot of flexibility
- RNNs accept inputs of variable length with categorical and continuous
features, and variable length output.
- Long Short-Term Memory (LSTM) is an optimization of RNNs used here
source: https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/03/1-768x421.png
45. Sentence
System extracts features from parses and predicts correct heads
Parse Hypothesis
Sentence Acoustics
Feature
Extraction
Training Stage
Gold Standard
Parse
Correct heads Outcome
Testing Stage
Sentence
Parse Hypothesis
Sentence Acoustics
Feature
Extraction
parse hypothesis
predicted correct
heads
46. System scores parses and selects most likely parse
Using syntactic features from parse hypotheses and acoustic information to make
predictions about which parse is more likely
spaCy (UAS: 0.83)
clearNLP (UAS: 0.67)
syntaxnet (UAS: 0.33)
0.75 0.87 0.69 1.01 0.75 1.04
Predictions
0.75 0.77 0.78 1.02 0.58 0.88
0.03 0.76 0.30 0.89 0.33 0.84
Sum: 5.11
Sum: 4.78
Sum: 3.15
47. Prosody can improve parsing for Switchboard data
System Text Features Prosodic features UAS Dev UAS Test
clearnlp 79.76 79.59
spacy 79.06 78.91
syntaxnet 72.54 72.81
Oracle 85.93 85.89
Ensemble POS, configs 80.69 80.73
Ensemble POS, configs Dur, dur log, pause 81.21 81.17
Ensemble Lexical, POS, configs, links 83.47 83.36
Ensemble Lexical, POS, configs, links Dur, pause 83.51 83.39
48. ● Only duration and pauses were used, while pitch, intensity and other acoustic
information can still be used in further work
● Phrase length information was not used in any of the features
● Speech repairs and disfluencies are marked with prosodic cues but not
addressed in this study
● Other correspondences between prosody and dependency structure were
suggested in this study but need further development (dependency chunks)
Further improvements are possible
49. Output analysis doesn’t indicate clear improvement patterns
● An output analysis didn’t show clear improvement patterns in the following:
○ sentences with PP-attachment
○ sentences with RC-attachment
○ sentences with parentheticals
○ sentences with speech repairs
● For sentence size, largest improvement was for sentences 3-8 words, unclear
patterns for larger sentences, but mainly smaller improvement
UAS Dev UAS Test
UAS Improvement (POS + Configs) with prosody 0.51 0.44
Sentences with improved UAS 268 240
Sentences with worse UAS 178 180
Sentences with the Same UAS 4970 5036
p-value (paired t-test comparing UAS values for all sentences) < .001 < .001
50. Conclusions
● Part 1:
○ Syntactic phrase length affects prosodic phrasing, also in French
● Part 2:
○ Syntactic ambiguity can be resolved prosodically by speakers
○ Prosodic cues can be used by human listeners and computers to predict the syntactic structure
○ Syntactic phrase length also affects prosodic phrasing in speaking, and can be used by
computers as a factor, along with prosody, to improve prediction of the structure
● Part 3:
○ Certain syntactic information (dependency configurations), based on dependency structure,
relates to prosodic breaks
○ Using this information together with timing (pause and duration) is more useful for selecting
better parses than syntactic information only
○ The ensemble system yields better performance than any individual parser in the ensemble
51. Final Note
● This dissertation is an interdisciplinary work, building on prosody research
from phonology and psycholinguistics towards computational goals
● Using dependency structure can provide a new perspective for investigating
syntax-prosody relationship
Picking a sentence, show the parse tree, play the recording, prosodic phrases, prosodic structure
all languages have hierarchically ordered prosodic structure, and languages are thought to make use of the same set of prosodic categories in the structuring of utterances. (Elfner, 2019)
Prosodic structure is flatter than syntactic structure (less recursion) due to strict layering
Theoretical model/ construct, prosodic prosodic structure as a theoretical construct, representing the convergence of all these factors
Wrap XP demands that each syntactic XP be contained in a phonological phrase (ϕ).
Wrap favours prosodic phrasings that do not break up syntactic constituents over those that do.
Include an audio for a sentence
to match each word in the sentence text with the corresponding portion of the audio file of its recording
as the difference between the start time of a word and the end time of a preceding word
Less boundaries at 4 than at 5, less at 2 than 1
average pause duration at location 4: 105 ms (ENC), 154 ms (DISC).
Be ready for questions about why data differs between people and machines - and not many early pauses
pvalue=0.056 (excluding pauses greater than 1) ENC: 189, DISC: 202
ENC average: 0.080, DISC average 0.110
number of recordings actually aligned within the ENC category was 193 files (out of 271), while the ones from DISC category were 204 files (out of 274).
Different structure, different meaning
Sentences are unambiguous when there is comma, only ambiguous without, clear markers of prosody
Explain better the phases
Participants in the perception experiment were asked to answer a comprehension question based on the format provided (text or audio)
A separate page for demostration of the interface/recordings
http://champolu.net/mturk/list_speakers_ambig_verify.py
[[talk about data excluded]]
[[ use the counts]]
Effect of attachment is much stronger for relative clauses, consider whether another factor is also involved (e.g. phrase length)
185
[[make it like a headline]]
Different message message for each slide
Statistics
How features are represented
Complete description of machine learning,
The problem: identify attahment, random guessing, most popular, all not bold, bold ones statistically indistinguishable
Extra slides to explain these approaches
theoretical foundation (e.g. these approaches tried using prosody as punctuation, attaching prosodic breaks to the tree, or use the prosodic/acoustic features as input in a kind of black-box format)
Based on results of preceding chapters, further issues of interest arise:
providing information about the most likely syntactic structure for a given utterance
Define UAS - reemove UAS
Using an ensemble classifier can improve parsing, by choosing the better parses (closest to the ground truth manually annotated parse trees, as measured by metrics such as UAS) from a number of hypotheses, using features from the dependency structure. [[reranking // Ensemble]]
Hypothesis 3- Using prosody combined with the features obtained from the dependency structure can achieve even better improvement
Parsing Metrics: Main metric: Unlabeled Attachment Score (UAS)
Main metric used in dependency parsing is Unlabelled Attachment Score (UAS), reflecting how many heads in the sentence were predicted correctly by the parser, compared to the manually annotated trees
Dependency structure represents the information contained in constituency structure
Emergence of Universal Dependencies, a framework that describes dependency grammar for many languages, allowing easy scalability of parsers into such languages.
Dependency structure represents semantic relationships between words
the immediate constituents of an intonational phrase must together form a sense unit. Two constituents form a sense unit if they are semantically related (as arguments or modifiers)
[[show how is calculated]]
The main new concept proposed here is "dependency configurations"
Dependency configurations represent the relative head location of two consecutive words (examples below)
[[just keep the counts]]
[[ a bullet to say this is one third/one fourth]] 1067
[[remove percentages]] 1689
[[is this the ground truth]]] correctness || ground truth/target
Mention that we focus on syntactic/prosodic, we also report lexical
Highlighted row different table
Using lexical features yielded a significant improvement (UAS dev set 83.47, an improvement of 3.7% above the best parser), but using prosody with these features didn’t yield further significant improvement (UAS dev set 83.51)
[[remove this]]
The top individual system in the ensemble was clearNLP parser which achieved UAS of 79.76 for development set and 79.59 for the test set. For a text-only ensemble baseline using Parts of Speech tags and dependency configurations, the performance was 80.69 for development set and 80.73 for test set. Adding prosody features (normalized word duration, log duration, pauses) to this model, the UAS increases to 81.21 for development set and 81.17 for test set. (4.8)
[[show it, example, sentence was improved, but not PP-attachment]]
the improvement according to sentence length was also analyzed, and it is possible that little improvement was achieved for longer sentences, where the largest improvement for sentences of 10 words or less. This can be due to phrase length factors that have not been considered in the current implementation.
- Therefore, prosodic information can be used to improve parsing, and further improvement may be possible with phrase length information