This document describes using anomaly detection methods to identify word-level annotation errors in text-to-speech corpora. It compares three anomaly detection models - Univariate Gaussian Distribution, Multivariate Gaussian Distribution, and One-Class SVM - on a Czech speech corpus. The models were trained on correctly annotated words and tested on words with known annotation errors. When carefully configured using feature selection and model parameter optimization, the models achieved an F1 score of around 89% at detecting misannotated words, comparable to classification-based error detection methods but without needing examples of errors for training. Dimensionality reduction techniques produced similar results to carefully selected features. Future work could analyze error types and test the approaches on more varied speech data.
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
Comparing the State-of-the-Art Deep Learning with Machine Learning algorithms performance on TF-IDF vector creation for Sentiment Analysis using Airline Tweeter Data Set.
Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
[2017-05-29] DNASmartTagger : Development of DNA sequence tagging tools based on machine learning using public sequence annotation data, NIG International Symposium 2017.
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...NETFest
В этом докладе мы обсудим базовые алгоритмы и области применения Machine Learning (ML), затем рассмотрим практический пример построения системы классификации результатов измерения производительности, получаемых в Unity с помощью внутренней системы Performance Test Framework, для поиска регрессий производительности или нестабильных тестов. Также попробуем разобраться в критериях, по которым можно оценивать производительность алгоритмов ML и способы их отладки.
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
Comparing the State-of-the-Art Deep Learning with Machine Learning algorithms performance on TF-IDF vector creation for Sentiment Analysis using Airline Tweeter Data Set.
Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
[2017-05-29] DNASmartTagger : Development of DNA sequence tagging tools based on machine learning using public sequence annotation data, NIG International Symposium 2017.
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...NETFest
В этом докладе мы обсудим базовые алгоритмы и области применения Machine Learning (ML), затем рассмотрим практический пример построения системы классификации результатов измерения производительности, получаемых в Unity с помощью внутренней системы Performance Test Framework, для поиска регрессий производительности или нестабильных тестов. Также попробуем разобраться в критериях, по которым можно оценивать производительность алгоритмов ML и способы их отладки.
This project explores sentiment analysis, a technique used to understand emotions expressed in text. We delve into the world of movie reviews, applying sentiment analysis techniques to uncover audience sentiment towards various films. This can provide valuable insights for filmmakers, studios, and moviegoers alike. For more analysis and artificial intelligence related content visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
Definition of Machine Learning
Type of Machine Learning:
Classification
Regression
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning:
Supervised Classification
Training set
Hypothesis class
Empirical error
Margin
Noise
Inductive bias
Generalization
Model assessment
Cross-Validation
Classification in NLP
Types of Classification
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Azure Machine Learning and ML on PremisesIvo Andreev
Machine Learning finds patterns in large volumes of data and uses those patterns to perform predictive analysis.Microsoft offers Azure Machine Learning, while Amazon offers Amazon Machine Learning and Google offers the Google Prediction API - now depricated and replaced by Google ML engine based on TensorFlow. Software products such as MATLAB support traditional, non-cloud-based ML modeling.
محاضرة ألقيت بتنظيم من مجموعة برمج @parmg_sa
https://www.meetup.com/parmg_sa/events/238339639/
في الرياض، مقر حاضنة بادر. بتاريخ 20 جمادى الآخر 1438هـ، الموافق 18 مارس 2017
Nighthawk: A Two-Level Genetic-Random Unit Test Data GeneratorCS, NcState
A talk to ASE 2007 by Jamie Andrews, Felix C. H. LiDepartment of Computer ScienceUniversity of Western Ontario; and Tim MenziesLane Department of Computer ScienceWest Virginia University
This project explores sentiment analysis, a technique used to understand emotions expressed in text. We delve into the world of movie reviews, applying sentiment analysis techniques to uncover audience sentiment towards various films. This can provide valuable insights for filmmakers, studios, and moviegoers alike. For more analysis and artificial intelligence related content visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
Definition of Machine Learning
Type of Machine Learning:
Classification
Regression
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning:
Supervised Classification
Training set
Hypothesis class
Empirical error
Margin
Noise
Inductive bias
Generalization
Model assessment
Cross-Validation
Classification in NLP
Types of Classification
Machine learning and linear regression programmingSoumya Mukherjee
Overview of AI and ML
Terminology awareness
Applications in real world
Use cases within Nokia
Types of Learning
Regression
Classification
Clustering
Linear Regression Single Variable with python
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Azure Machine Learning and ML on PremisesIvo Andreev
Machine Learning finds patterns in large volumes of data and uses those patterns to perform predictive analysis.Microsoft offers Azure Machine Learning, while Amazon offers Amazon Machine Learning and Google offers the Google Prediction API - now depricated and replaced by Google ML engine based on TensorFlow. Software products such as MATLAB support traditional, non-cloud-based ML modeling.
محاضرة ألقيت بتنظيم من مجموعة برمج @parmg_sa
https://www.meetup.com/parmg_sa/events/238339639/
في الرياض، مقر حاضنة بادر. بتاريخ 20 جمادى الآخر 1438هـ، الموافق 18 مارس 2017
Nighthawk: A Two-Level Genetic-Random Unit Test Data GeneratorCS, NcState
A talk to ASE 2007 by Jamie Andrews, Felix C. H. LiDepartment of Computer ScienceUniversity of Western Ontario; and Tim MenziesLane Department of Computer ScienceWest Virginia University
Nighthawk: A Two-Level Genetic-Random Unit Test Data Generator
is2015_poster
1. Anomaly-Based Annotation Errors Detection in TTS Corpora
Jindřich Matoušek and Daniel Tihelka
Dept. of Cybernetics, NTIS – New Technologies for the Information Society
Faculty of Applied Sciences, University of West Bohemia, Czech Republic
jmatouse@kky.zcu.cz, dtihelka@ntis.zcu.cz
Introduction
Concatenative Speech Synthesis
• unit selection still very popular approach
to speech synthesis
• nearly natural-sounding speech when
enough data in given style available
Unit Selection Disadvantages
• very large speech corpora (>10 hours of
speech)
• high-quality consistent recordings (stu-
dio, voice quality, style, . . . )
• need for very precise annotation
• wrong word-level annotation causes gross
synthesis errors [1]!
• speech signal does not correspond to the an-
notation
« could result in unintelligible speech!
Misannotation Example
source recording
synthetic speech
wrong annot.
correct annot.
Aim of this Study
• use anomaly detection to detect word-
level annotation errors
« detected errors could be fixed/removed
from speech corpus
Anomaly Detection Methods
Problem Definition
• anomaly (novelty/outlier) detection
= identification of items which do not conform to an expected
pattern
• misannotated words considered as anomalous examples
• correctly annotated words taken as normal examples
• unsupervised technique under the assumption that examples
are not polluted by anomalies
Nn. . . number of normal examples
Nf . . . number of features
x(1)
, . . . , x(Nn)
. . . training set
of normal examples
x(i)
∈ RNf
. . . i-th example
Univariate Gaussian Distribution (UGD)
• each feature modeled separately with mean µj ∈ R and variance σ2
j ∈ R: xj ∼ N(µj, σ2
j)
• assumption of feature independence
• training – fitting parameters µj, σ2
j:
µj =
1
Nn
Nn
i=1
x
(i)
j , σ2
j =
1
Nn
Nn
i=1
(x
(i)
j − µj)2
• probability of a new example x:
p(x) =
Nf
j=1
p(xj; µj, σ2
j) =
Nf
j=1
1
√
2πσj
exp(−
(xj − µj)2
2σ2
j
)
• anomaly detection: if p(x) < ε ⇒ x is anomalous
• model parameter:
ε threshold probability value to distinguish between normal/anomalous examples
Multivariate Gaussian Distribution (MGD)
• p(x) modeled in one go using mean vector µ ∈ RNf
and covariance matrix Σ ∈ RNf×Nf
:
x ∼ NNf
(µ, Σ)
• training – fitting parameters µ, Σ:
µ =
1
Nn
Nn
i=1
x(i)
, Σ =
1
Nn
Nn
i=1
(x(i)
− µ)(x(i)
− µ)
• probability of a new example x:
p(x) =
1
(2π)Nf|Σ|
exp −
1
2
(x − µ) Σ−1
(x − µ)
• anomaly detection: if p(x) < ε ⇒ x is anomalous
• model parameter:
ε threshold probability value to distinguish between normal/anomalous examples
One-Class SVM (OCSVM)
• mapping of input data into a high dimensional feature space via a kernel function
• finding maximal margin hyperplane which best separates training data from the origin
• training – determining hyperplane parameters w, ρ (quadratic programming problem):
minw,ξ,ρ
1
2||w||2
+ 1
νNn
Nn
i=1 ξi − ρ s.t. w · Φ(x(i)
) ≥ ρ − ξi, i = 1, 2, . . . , Nn, ξi ≥ 0
• Gaussian radial basis function used as kernel function
K(x, x ) = exp(γ||x − x ||2
)
• binary decision function (αi . . . Lagrange multipliers):
f(x) = sgn(w · Φ(x) − ρ) = sgn(
Nn
i=1
αiK(x(i)
, x) − ρ)
• anomaly detection: f(x) =
+1 ⇒ x is normal
−1 ⇒ x is anomalous
• model parameters:
ν represents an upper bound on the fraction of possibly anomalous examples
γ kernel parameter
References
[1] J. Matoušek, D. Tihelka, and L. Šmídl, “On the impact of annotation errors on unit-selection speech synthesis,”
Text, Speech and Dialogue, ser. Lecture Notes in Computer Science, vol. 7499, 2012.
[2] C.-Y. Lin and R. Jang, “Automatic phonetic segmentation by score predictive model for the corpora of Mandarin
singing voices,” IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 7, 2007.
[3] J. Matoušek and D. Tihelka, “Annotation errors detection in TTS Corpora,” in INTERSPEECH 2013.
[4] F. Pedregosa et. al, “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol 12, 2011.
[5] T. G. Dietterich, “Approximate statistical tests for comparing supervised classification algorithms,” Neural
Comput., vol. 10, 1998.
Experimental Data & Features
Speech Data
• Czech read-speech single-speaker corpus
• “news-broadcasting style”, no emotions, . . .
• recordings forced-aligned to phones (HTK)
• normal examples: 1124 correctly annot. words
• anomalous examples: 273 misannotated words
• misannotated words collected by human ex-
perts during TTS system evaluation
Feature Extraction & Collection
Phone-level features (basic, acoustic, spectral, other)
... ... ... ...
Word-level features
p1 p2 p3 p4
Phonetic
features
Positional
features
word1 word2
phrase1 phrase2
utterance1
words:
phrases:
utterances:
word3 word3 word3
phones:
Deviation from CART-
predicted values
Word-level
histograms
Word-level
statistics
Features
• based on intuitions when checking forced align-
ment
• phone- and word-level features
• z-score deviations from CART-predicted
phone-level values used to emphasize phone-
level anomalies
• phone-level to word-level features conversion
• mean/median, min., max. phone-level feature value
• range of feature values
• within-word anomalies emphasized by histograms
Summary of Features
Features Description
Phone-level features
Basic duration, forced-aligned acoustic likelihood
Acoustic energy, formants (F1, F2, F3, F2/F1), fundamental frequency (F0), zero
crossing, voiced/unvoiced ratio
Spectral spectral crest factor, rolloff, flatness, centroid, spread, kurtosis, skewness,
harmonic-to-noise ratio
Other score predictive model (SPM) [2], energy/duration ratio, spectral
centroid/duration ratio
Word-level features
Phonetic phonetic voicedness ratio, sonority ratio, syllabic consonants ratio,
articulation manner distribution, articulation place distribution, word
boundary voicedness match [3]
Positional forward/backward position of word/phrase in phrase/utterance, the
position of the phrase in an utterance
Experiments
Model Training and Selection
• Normal examples
• 60% (674) used for training
• 20% (225) used for validation
• 20% (225) used for evaluation
• Anomalous examples
• 50% (136) used for validation
• 50% (137) used for evaluation
• none used for training
• Model training and selection
• features standardized to have zero mean and unity
variance
• models’ parameters optimized with respect to F1
score applying a grid search with 10-fold cross val-
idation
• various feature set combinations were also part of
model selection
• scikit-learn toolkit [4] used
UGD, MGD, OCSVM – anomaly detection models
model∗
. . . optimally selected models
model0 . . . models trained on basic features
modelall . . . models trained on all features
modeldim. . . models with best reduced features
Model ID Parameters Features (#)
UGD∗
ε = 0.005 duration: stats + histogram +
zscore, acoust. likelihood: stats +
histogram, energy: zscore (28)
MGD∗
ε = 2.5e-14
OCSVM∗
ν = 0.005
γ = 0.03125
duration: stats + histogram +
zscore, acoust. likelihood: stats +
histogram, energy/duration: stats
(28)
UGDdim ε = 5.0e-24 PCA (20)
MGDdim ε = 5.0e-24 PCA (20)
OCSVMdim ν = 0.125
γ = 0.125
ICA (30)
UGD0 ε = 2.0e-7 duration: stats, acoust. likelihood:
stats (8)
MGD0 ε = 7.9e-4
OCSVM0 ν = 0.05 γ = 0.25
UGDall — all features (359)
MGDall —
OCSVMall ν = 0.075
γ = 2.4e-4
(— means no optimal values were found)
Dimensionality Reduction
• automatic selection of best
feature combination us-
ing dimensionality reduc-
tion techniques
• number of features seen
as another parameter of
model selection process
Comparison on validation set
PCA. . . principal component analysis
ICA . . . independent component analysis
FAG. . . feature agglomeration
CVF. . . features selected by cross validation
(i.e. UGD∗
, MGD∗
, OCSVM∗
)
PCA ICA FAG CVF
72
74
76
78
80
82
84
86
88
90
F1[%]
UGD
MGD
OCSVM
Evaluation & Results
Detection Metrics
• Precision: P =
tp
pp
• Recall: R =
tp
ap
• F1: F1 = 2∗P∗R
P+R
tp . . . No. of words correctly detected as misannotated
pp. . . No. of all words detected as misannotated
ap. . . No. of actual misannotated words
Alerted results are statistically significant (McNemar’s test, α = 0.05) [5]
Discussion
• dimensionality reduction techniques achieve similar results as fea-
ture combinations carefully selected by cross validation
• OCVSM with all features achieves comparable results
• features emphasizing anomalies (z-score deviations from CART-
predicted values, histograms) are important
• spectral, phonetic, and positional features not so important
Results
Model ID P[%] R[%] F1[%]
UGD∗
84.83 89.78 87.23
MGD∗
87.32 90.51 88.89
OCSVM∗
85.71 87.59 86.64
UGDdim 88.15 86.86 87.50
MGDdim 88.15 86.86 87.50
OCSVMdim 85.40 85.40 85.40
UGD0 84.26 66.42 74.29
MGD0 76.03 81.02 78.45
OCSVM0 82.95 78.10 80.45
UGDall 37.85 100.00 54.91
MGDall 47.06 99.27 63.85
OCSVMall 87.97 85.40 86.67
random detect. 23.70 25.50 24.60
Conclusion & Future Work
• all three anomaly detection techniques performed similarly well with F1 ≈ 89% when carefully
configured using grid search and cross validation
• results comparable with classification-based detection [3] but no misannotated words needed to train
anomaly-based detector ⇒ training data collection should be easier
• Future work:
• error analysis to spot any potential systematic trend in the misdetected words
• performance on data from more speakers and/or more languages
• performance on spontaneous (expressive) data
« annotate only words detected as erroneous when building a new TTS voice
This research was supported by the grant TAČR TA01030476. The access to the MetaCentrum clusters provided under the programme LM2010005 is highly appreciated.