SlideShare a Scribd company logo
1 of 31
Download to read offline
How do Voices from Past Speech Synthesis Challenges
Compare Today?
Erica Cooper, Junichi Yamagishi
August 28, 2021
National Institute of Informatics, Japan
Table of contents
1. Introduction
2. Listening Test
3. Results
4. Analysis of Natural Speech
5. Discussion and Future Work
1
Introduction
Project
• A very large-scale listening test combining samples from many past Blizzard
Challenges and Voice Conversion Challenges
• MOS results from separate past listening tests cannot be meaningfully combined and
compared because the set of systems and therefore the context of the test are completely
different.
• Conducting a new test with these samples combined will enable more direct comparisons.
• The data gathered can be used to train MOS prediction models.
2
Motivation
• How reliable and reproducible are MOS scores?
• How do past listening test results compare to ratings gathered in the present day?
• Will the results correlate even when the listening test context has changed?
• What observations can we make about speech synthesis and voice conversion systems
over the years?
• What are the effects of the speaker of the dataset on synthesized speech quality?
• Collect a very large-scale database of a variety of synthesizer outputs and their
MOS ratings for the purpose of training MOSnet-like systems for automatic MOS
prediction
3
Listening Test
Large-scale listening test
• 187 different systems from past BC, VCC, and ESPnet TTS
• BC 2008, 2009, 2010, 2011, 2013, 2016 (English EH1 tasks only)
• VCC 2016, 2018, 2020 (same-language task only)
• ESPnet TTS: samples from ICASSP 2020 trained on LJSpeech
• 38 utterances per system
• Samples are balanced over genre where relevant
• Samples are balanced over source and target speakers for VCC
• Only genres included in the original naturalness tests are included
• Test design
• One sample from each system per set
• One listener rates one set containing 187 samples
• Coverage of 8 Japanese listeners per set; 304 total listeners
• MOS rating for naturalness on a scale of 1-5
• Significant differences: Mann-Whitney U test (p<0.05) with Bonferroni correction
4
Results
Listening test results
VCC2018-N06
VCC2018-N16
VCC2020-T14
VCC2016-C
VCC2016-baseline
VCC2016-I
VCC2020-T26
VCC2018-N19
VCC2018-N18
BC2016-C
VCC2020-T21
VCC2018-N07
VCC2018-N09
BC2013-P
VCC2016-M
VCC2018-D01
VCC2016-H
VCC2016-E
VCC2018-N20
VCC2020-T17
VCC2020-T18
VCC2018-N14
VCC2018-D03
VCC2018-D05
VCC2018-N11
BC2016-J
BC2010-Q
VCC2018-N03
VCC2016-A
BC2016-K
BC2016-N
VCC2020-T09
VCC2016-D
VCC2018-D04
VCC2016-G
BC2016-P
VCC2018-N13
VCC2018-N05
BC2016-E
VCC2016-Q
VCC2018-D02
BC2016-I
VCC2020-T02
BC2016-G
VCC2016-B
BC2009-M
BC2011-I
BC2008-I
VCC2018-N15
BC2013-F
VCC2020-T19
BC2009-P
VCC2018-N12
VCC2020-T06
BC2013-B
BC2013-H
BC2009-R
BC2016-D
BC2016-O
VCC2020-T01
BC2008-R
VCC2020-T31
VCC2018-N17
BC2016-F
VCC2016-F
VCC2020-T16
VCC2016-N
VCC2020-T08
VCC2016-J
BC2009-E
BC2009-Q
VCC2016-P
VCC2018-N08
VCC2016-K
VCC2018-N04
BC2009-T
BC2008-F
BC2008-D
VCC2020-T12
VCC2020-T28
BC2016-H
BC2008-H
VCC2016-L
BC2008-N
BC2010-O
VCC2020-T03
BC2008-T
BC2009-W
BC2008-L
BC2008-P
VCC2016-O
BC2008-Q
VCC2020-T24
BC2009-J
ESPnet-merlin
VCC2020-T04
BC2016-B
BC2008-G
BC2009-D
BC2011-J
BC2008-E
VCC2018-B01
BC2009-O
BC2008-M
BC2009-B
BC2008-C
VCC2020-T20
VCC2020-T23
BC2008-V
VCC2020-T22
BC2010-C
BC2008-O
BC2010-L
BC2009-C
ESPnet-fastspeechv2
BC2008-B
BC2009-I
BC2016-M
BC2008-K
VCC2020-T32
BC2009-L
BC2009-H
BC2011-B
BC2016-Q
VCC2020-T07
BC2013-N
BC2008-S
ESPnet-fastspeechv3
BC2010-R
BC2010-H
BC2010-G
BC2010-N
BC2013-L
VCC2020-T33
BC2010-U
BC2013-C
BC2013-I
VCC2018-N10
BC2010-B
BC2011-M
BC2011-D
VCC2020-T25
BC2010-S
VCC2020-T27
BC2016-L
BC2008-J
BC2011-F
BC2013-K
VCC2020-T30
BC2010-P
BC2011-K
BC2011-L
BC2011-C
VCC2020-T13
BC2009-K
VCC2020-T29
VCC2020-T11
BC2010-F
ESPnet-mozilla
VCC2020-T35
BC2011-H
BC2016-A
BC2011-E
BC2009-S
VCC2018-T00
VCC2020-T10
BC2010-V
VCC2016-target
BC2010-T
VCC2016-source
BC2008-A
VCC2018-S00
VCC2020-T34
BC2013-M
BC2010-J
ESPnet-tacotron2v2
BC2011-G
ESPnet-nvidia
ESPnet-tacotron2v3
BC2009-A
ESPnet-transformerv1
BC2010-M
BC2013-A
ESPnet-transformerv3
BC2010-A
ESPnet-raw
BC2011-A
1
2
3
4
5
0
50
100
150
200
5
Listening test results
Best systems
• ESPnet-transformerv3
• BC2010-M
• ESPnet-transformerv1
• ESPnet-tacotron2v3
• ESPnet-nvidia
Worst systems
• VCC2018-N06
• VCC2018-N16
• VCC2020-T14
• VCC2016-C
• VCC2016-baseline
VCC typically has a smaller
amount of data
6
Listening test results
Best systems
• ESPnet-transformerv3
• BC2010-M
• ESPnet-transformerv1
• ESPnet-tacotron2v3
• ESPnet-nvidia
Worst systems
• VCC2018-N06
• VCC2018-N16
• VCC2020-T14
• VCC2016-C
• VCC2016-baseline
VCC typically has a smaller
amount of data
7
Listening test: Agreement
Agreement = 0.5 (Krippendorff’s
Alpha and Intra-Class Correlation)
Merlin has the largest variation in
scores 8
Do new MOS results correlate with original ones?
System-level and utterance-level Pearson’s correlation coefficient, Spearman Rank correlation
coefficient, and root mean squared error between original and new listening test results by challenge or
set of systems
System-level Utterance-level
Challenge PCC SRCC RMSE PCC SRCC RMSE
BC2008 0.93 0.89 0.33 0.70 0.67 0.62
BC2009 0.97 0.95 0.48 0.76 0.72 0.64
BC2010 0.93 0.98 0.66 0.74 0.73 0.85
BC2011 0.91 0.90 0.76 0.76 0.67 0.87
BC2013 0.97 0.98 0.49 - - -
BC2016 0.97 0.93 0.40 - - -
VCC2016 0.97 0.92 0.42 0.56 0.53 1.12
VCC2018 0.96 0.91 0.77 0.55 0.53 1.10
VCC2020 0.98 0.96 0.23 0.87 0.87 0.48
ESPnet 0.99 0.98 0.09 0.73 0.61 0.59
9
Do new MOS results correlate with original ones?
System-level and utterance-level Pearson’s correlation coefficient, Spearman Rank correlation
coefficient, and root mean squared error between original and new listening test results by challenge or
set of systems
System-level Utterance-level
Challenge PCC SRCC RMSE PCC SRCC RMSE
BC2008 0.93 0.89 0.33 0.70 0.67 0.62
BC2009 0.97 0.95 0.48 0.76 0.72 0.64
BC2010 0.93 0.98 0.66 0.74 0.73 0.85
BC2011 0.91 0.90 0.76 0.76 0.67 0.87
BC2013 0.97 0.98 0.49 - - -
BC2016 0.97 0.93 0.40 - - -
VCC2016 0.97 0.92 0.42 0.56 0.53 1.12
VCC2018 0.96 0.91 0.77 0.55 0.53 1.10
VCC2020 0.98 0.96 0.23 0.87 0.87 0.48
ESPnet 0.99 0.98 0.09 0.73 0.61 0.59
10
Do new MOS results correlate with original ones?
System-level and utterance-level Pearson’s correlation coefficient, Spearman Rank correlation
coefficient, and root mean squared error between original and new listening test results by challenge or
set of systems
System-level Utterance-level
Challenge PCC SRCC RMSE PCC SRCC RMSE
BC2008 0.93 0.89 0.33 0.70 0.67 0.62
BC2009 0.97 0.95 0.48 0.76 0.72 0.64
BC2010 0.93 0.98 0.66 0.74 0.73 0.85
BC2011 0.91 0.90 0.76 0.76 0.67 0.87
BC2013 0.97 0.98 0.49 - - -
BC2016 0.97 0.93 0.40 - - -
VCC2016 0.97 0.92 0.42 0.56 0.53 1.12
VCC2018 0.96 0.91 0.77 0.55 0.53 1.10
VCC2020 0.98 0.96 0.23 0.87 0.87 0.48
ESPnet 0.99 0.98 0.09 0.73 0.61 0.59
11
Do new MOS results correlate with original ones?
System-level and utterance-level Pearson’s correlation coefficient, Spearman Rank correlation
coefficient, and root mean squared error between original and new listening test results by challenge or
set of systems
System-level Utterance-level
Challenge PCC SRCC RMSE PCC SRCC RMSE
BC2008 0.93 0.89 0.33 0.70 0.67 0.62
BC2009 0.97 0.95 0.48 0.76 0.72 0.64
BC2010 0.93 0.98 0.66 0.74 0.73 0.85
BC2011 0.91 0.90 0.76 0.76 0.67 0.87
BC2013 0.97 0.98 0.49 - - -
BC2016 0.97 0.93 0.40 - - -
VCC2016 0.97 0.92 0.42 0.56 0.53 1.12
VCC2018 0.96 0.91 0.77 0.55 0.53 1.10
VCC2020 0.98 0.96 0.23 0.87 0.87 0.48
ESPnet 0.99 0.98 0.09 0.73 0.61 0.59
12
Do MOS scores improve year by year?
Best system in each challenge compared to the previous challenge’s best system
Year : Best system MOS Improved? Significant?
BC2008 : J 3.63
BC2009 : S 3.87 X x
BC2010 : M 4.27 X X
BC2011 : G 4.12 x x
BC2013 : M 4.01 x x
BC2016 : L 3.63 x X
VCC2016 : O 2.86
VCC2018 : N10 3.55 X X
VCC2020 : T10 3.88 X x
ESPnet : transformerv3 4.33
13
Do MOS scores improve year by year?
Best system in each challenge compared to the previous challenge’s best system
Year : Best system MOS Improved? Significant?
BC2008 : J 3.63
BC2009 : S 3.87 X x
BC2010 : M 4.27 X X
BC2011 : G 4.12 x x
BC2013 : M 4.01 x x
BC2016 : L 3.63 x X
VCC2016 : O 2.86
VCC2018 : N10 3.55 X X
VCC2020 : T10 3.88 X x
ESPnet : transformerv3 4.33
14
At what point did TTS quality reach that of natural speech?
Is the year’s best system significantly different from that year’s natural speech?
Year : Best system Significant difference from natural speech?
BC2008 : J X
BC2009 : S X
BC2010 : M x
BC2011 : G X
BC2013 : M x
BC2016 : L x
VCC2016 : O X
VCC2018 : N10 x
VCC2020 : T10 x
ESPnet : transformerv3 x
15
At what point did TTS quality reach that of natural speech?
Difference of each system from natural speech, computed from averaged
z-score-normalized ratings by listener for each challenge
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
2.5
2.0
1.5
1.0
0.5
0.0 BC2008
BC2009
BC2010
BC2011
BC2013
BC2016
ESPnet
16
Correlation with objective measures
• SNR: r=0.17
• P.563: r=0.05
• MOSnet trained on ASVspoof: r=0.03
Room for improvement of
objective measures
17
Correlation with objective measures
• SNR: r=0.17
• P.563: r=0.05
• MOSnet trained on ASVspoof: r=0.03
Room for improvement of
objective measures
18
Analysis of Natural Speech
Natural speech preferences and effects of corpus on TTS
• The effect of speech corpus on perceived TTS quality is well-documented:
• J. Williams, J. Rownicka, P. Oplustil, and S. King, “Comparison of speech representations
for automatic quality estimation in multi-speaker text-to- speech synthesis,” 2020
• F. Hinterleitner, C. Manolaina, and S. Moller, “Influence of a voice on the quality of
synthesized speech,” 2014
• Since every challenge uses a different corpus, this is a confounding factor to making
meaningful direct comparisons across challenges, but it is still important to capture
preferences regarding these factors for training a MOS prediction model.
19
Natural speech metadata and genre
• Professional speakers were rated as significantly more natural than non-professional
speakers
• Female speakers had a marginally-significantly (p=0.05) higher MOS than male speakers
• No significant differences between British and American speakers
• Genres: news, book, conversational
• News rated as most natural (MOS=4.36)
• Conversational: MOS=4.14
• Book: MOS=4.09 (significantly lower)
20
Speaker characteristics
• Standard Praat features:
• Minimum, maximum, mean, and standard deviation of f0 and energy
• NHR, jitter, shimmer
• Moderate negative correlations with MOS for shimmer (r=-0.46), NHR (r=-0.41), and
mean energy (r=-0.37)
• Moderate positive correlations with MOS for standard deviation of energy (r=0.41)
21
Effect of corpus on benchmark systems
• Moderate correlations for Festival
• Pearson r=0.33
• Spearman r=0.54
• Strong correlations for HTS
• Pearson r=0.87
• Spearman r=0.90
HTS output quality more
closely matches the quality
of the training data.
22
Effect of corpus on benchmark systems
• Moderate correlations for Festival
• Pearson r=0.33
• Spearman r=0.54
• Strong correlations for HTS
• Pearson r=0.87
• Spearman r=0.90
HTS output quality more
closely matches the quality
of the training data.
23
Discussion and Future Work
Discussion and future work
• We have a large dataset for training MOSnet-type systems
• Strong correlations with past listening tests
• Choice of speaker for training data is very important
• Will repeating the test with English listeners reveal language-dependent or cultural factors?
• Some systems have clear agreements whereas others have a wider distribution of scores.
• What makes certain systems so “controversial”?
• Are certain types of artifacts or unnaturalness more salient to some listeners than to others?
• Analysis of listener differences
• Incorporate variance of scores into MOSnet
24
Questions?
24

More Related Content

Similar to How do Voices from Past Speech Synthesis Challenges Compare Today?

2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071CS, NcState
 
Improving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via ClusteringImproving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via ClusteringLionel Briand
 
On QoE Metrics and QoE Fairness for Network & Traffic Management
On QoE Metrics and QoE Fairness for Network & Traffic ManagementOn QoE Metrics and QoE Fairness for Network & Traffic Management
On QoE Metrics and QoE Fairness for Network & Traffic ManagementTobias Hoßfeld
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Welocalize
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 
MOMDPSO_IDETC_2014_Weiyang
MOMDPSO_IDETC_2014_WeiyangMOMDPSO_IDETC_2014_Weiyang
MOMDPSO_IDETC_2014_WeiyangMDO_Lab
 
Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...IRJET Journal
 
Topic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVATopic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVATetsuya Sakai
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris
 
Introduction to the software Predicsites©
Introduction to the software Predicsites©Introduction to the software Predicsites©
Introduction to the software Predicsites©Yoann Pageaud
 
Voice Recognition Eye Test
Voice Recognition Eye TestVoice Recognition Eye Test
Voice Recognition Eye TestIRJET Journal
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATIONMODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATIONcscpconf
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization csandit
 

Similar to How do Voices from Past Speech Synthesis Challenges Compare Today? (20)

2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
Improving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via ClusteringImproving Requirements Glossary Construction via Clustering
Improving Requirements Glossary Construction via Clustering
 
On QoE Metrics and QoE Fairness for Network & Traffic Management
On QoE Metrics and QoE Fairness for Network & Traffic ManagementOn QoE Metrics and QoE Fairness for Network & Traffic Management
On QoE Metrics and QoE Fairness for Network & Traffic Management
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
MOMDPSO_IDETC_2014_Weiyang
MOMDPSO_IDETC_2014_WeiyangMOMDPSO_IDETC_2014_Weiyang
MOMDPSO_IDETC_2014_Weiyang
 
Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...
 
Final
FinalFinal
Final
 
Topic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVATopic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVA
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Toshiaki Nakazawa - 2015 - Over of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa - 2015 - Over of the 2nd Workshop on Asian TranslationToshiaki Nakazawa - 2015 - Over of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa - 2015 - Over of the 2nd Workshop on Asian Translation
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
 
Introduction to the software Predicsites©
Introduction to the software Predicsites©Introduction to the software Predicsites©
Introduction to the software Predicsites©
 
Voice Recognition Eye Test
Voice Recognition Eye TestVoice Recognition Eye Test
Voice Recognition Eye Test
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATIONMODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 

More from Yamagishi Laboratory, National Institute of Informatics, Japan

エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~Yamagishi Laboratory, National Institute of Informatics, Japan
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~Yamagishi Laboratory, National Institute of Informatics, Japan
 

More from Yamagishi Laboratory, National Institute of Informatics, Japan (17)

DDS: A new device-degraded speech dataset for speech enhancement
DDS: A new device-degraded speech dataset for speech enhancement DDS: A new device-degraded speech dataset for speech enhancement
DDS: A new device-degraded speech dataset for speech enhancement
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
 
Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...
Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...
Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...
 
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
 
Estimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasureEstimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasure
 
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
 
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio SynthesisText-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
 
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
 
Advancements in Neural Vocoders
Advancements in Neural VocodersAdvancements in Neural Vocoders
Advancements in Neural Vocoders
 
Neural Waveform Modeling
Neural Waveform Modeling Neural Waveform Modeling
Neural Waveform Modeling
 
Neural source-filter waveform model
Neural source-filter waveform modelNeural source-filter waveform model
Neural source-filter waveform model
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
 
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

How do Voices from Past Speech Synthesis Challenges Compare Today?