SlideShare a Scribd company logo
A Revisit to Feature Handling

for High-quality Voice Conversion

Based on Gaussian Mixture Model
Hitoshi Suda, Gaku Kotani,

Shinnosuke Takamichi, and Daisuke Saito
(The University of Tokyo)
APSIPA ASC 2018 / Nov. 14, 2018 No. 1370 / WE-A2-8.3
/ 24
Voice Conversion
2
Modifying personalities
in the utterance
Source
Speaker
Target
Speaker
/ 24
Voice Conversion Framework
• Training
• Conversion
3
Source Filtering Output
AnalysisSource
Conversion
diff
AnalysisSource
AnalysisSource
AnalysisTarget
Conversion Model
DTW

and Joint
/ 24
Voice Conversion Framework
• Training
• Conversion
4
Source Filtering Output
AnalysisSource
Conversion
diff
AnalysisSource
AnalysisSource
AnalysisTarget
Conversion Model
DTW

and Joint
/ 24
Voice Conversion Framework
• Training
• Conversion
5
Source Filtering Output
AnalysisSource
Conversion
diff
AnalysisSource
AnalysisSource
AnalysisTarget
Conversion Model
DTW

and Joint
/ 24
Aim of This Study






• To improve conversion quality

of existing voice conversion frameworks
• To experimentally reveal influences of

feature handling
• via subjective experiments
6
A Revisit to Feature Handling

for High-quality Voice Conversion

Based on Gaussian Mixture Model
/ 24
Experimental Setups
• 50 sentences from ATR Japanese phonetically
balanced sentence sets [Kurematsu+, 1990]
• 40 for training, 10 for evaluation
• Sampling frequency: 22050 Hz
• Analysis and synthesis: WORLD [Morise+, 2016]
• Speakers: 2 males and 2 females
• Only intra-gender conversion / No F0 conversion
• 23 listeners answered questions in each preference
test via our crowdsourcing system
7
/ 24
3 Experiments
1. Analysis conditions
2. Conversion system without statistical mapping
3. Total conversion system
8
AnalysisSource Synthesis Output
No conversion
Source Filtering Output
AnalysisSource
Conversion
diff
AnalysisSource
Source Filtering Output
AnalysisSource
AnalysisTarget
diff
DTW
/ 24
Experiment 1: Analysis Conditions
• To reveal the effects of conditions of analysis
• Frame periods (or frame shifts)
• How precisely the waveforms are analyzed
in time domain
• Order of mel-cepstral coefficients (mcep)
• How precisely the spectral envelopes are
represented
9
AnalysisSource Synthesis Output
No conversion
/ 24
Experiment 1: Analysis Conditions
• Frame periods:
5 ms < 1 ms

1 ms ≈ 500 μs

500 μs ≤ 50 μs
10
p < 0.01
p < 0.5
Frameperiod
Preference score
0% 25% 50% 75% 100%
50 µs
500 µs
1 ms
500 µs
1 ms
5 ms
p < 0.1
/ 24
Experiment 1: Analysis Conditions
• Order of mcep:
with 1 ms analysis

24 < 39

39 ≈ 59

59 < 79

79 ≈ 99
with 50 μs analysis

24 < 39

39 ≈ 59

59 ≈ 79

79 ≈ 99
11
Orderofmel-cepstral
coefficients
Preference score
0% 25% 50% 75% 100%
99
79
59
39
79
59
39
24 p < 10–7
p < 1.0
p < 1.0
p < 1.0
Orderofmel-cepstral
coefficients
Preference score
0% 25% 50% 75% 100%
99
79
59
39
79
59
39
24 p < 10–10
p < 0.5
p < 0.01
p = 1
/ 24
Experiment 1: Analysis Conditions
• Frame periods:

5 ms < 1 ms ≈ 500 μs ≤ 50 μs
• Order of mcep:

24 < 39 ≈ 59 < 79 ≈ 99 (with 1 ms frames)

24 < 39 ≈ 59 ≈ 79 ≈ 99 (with 50 μs frames)
12
/ 24
3 Experiments
1. Analysis conditions
2. Conversion system without statistical mapping
3. Total conversion system
13
AnalysisSource Synthesis Output
No conversion
Source Filtering Output
AnalysisSource
AnalysisTarget
diff
Source Filtering Output
AnalysisSource
Conversion
diff
AnalysisSource
DTW
/ 24
Differential-spectrum Compensation
• Famous implementation: Mel log spectrum
approximation (MLSA) Filtering [Imai+, 1983]
• We introduce a diffspec method “SP-WORLD”

inspired by WORLD vocoder
14
(diffspec) [Kobayashi+, 2014]
Filtering
(MLSA or SP-WORLD)
Output
waveforms
Source
waveforms
Difference of
spectra
/ 24
Dynamic Time Warping (DTW)
• Alignment of features
• Sensitive to difference

of individuality
• We introduce

“Affine-DTW”
• Iteration of alignment

and coarse conversion
15
Source speaker [s]
Targetspeaker[s]
0 0.3 0.6 0.9 1.2 1.5 1.8 2.1
0
0.3
0.6
0.9
1.2
1.5
1.8
2.1
2.4
2.7 non-affine
1-affine
2-affine
3-affine
/ 24
Experiment 2: Ideal Conversion
• To reveal the effects on quality of conversion

of the conditions except mapping models
• Diffspec method: MLSA or SP-WORLD
• Analysis conditions
• Frame period and order of mcep
16
Source Filtering Output
AnalysisSource
AnalysisTarget
diff
Affine-DTW
/ 24
Experiment 2: Ideal Conversion
• Diffspec method:
MLSA ≤ SP-WORLD

MLSA < SP-WORLD

MLSA < SP-WORLD

MLSA < SP-WORLD
17
24 / 5 ms
24 / 1 ms
79 / 5 ms
79 / 1 ms
Preference score
0% 25% 50% 75% 100%
SP-WORLDMLSA p < 0.05
p < 0.01
p < 10–3
p < 10–4
/ 24
Experiment 2: Ideal Conversion
• Order of mcep:
24 ≈ 79

24 ≈ 79

24 > 79

24 > 79
18
5 ms / MLSA
1 ms / MLSA
5 ms / SP-WORLD
1 ms / SP-WORLD
Preference score
0% 25% 50% 75% 100%
7924 p < 0.5
p < 1.0
p < 0.05
p < 10–3
/ 24
Experiment 2: Ideal Conversion
• Frame periods:
5 ms ≈ 1 ms

5 ms ≈ 1 ms

5 ms ≈ 1 ms

5 ms ≥ 1 ms
19
24 / MLSA
79 / MLSA
24 / SP-WORLD
79 / SP-WORLD
Preference score
0% 25% 50% 75% 100%
1 ms5 ms p < 0.5
p < 0.5
p < 0.5
p < 0.05
/ 24
Experiment 2: Ideal Conversion
• Diffspec method: MLSA < SP-WORLD
• Order of mcep: 24 > 79 (with SP-WORLD)

24 ≈ 79 (with MLSA)
• Frame periods:

5 ms ≥ 1 ms (with SP-WORLD / 79-order)

5 ms ≈ 1 ms (with other conditions)
20
/ 24
3 Experiments
1. Analysis conditions
2. Conversion system without statistical mapping
3. Total conversion system
21
AnalysisSource Synthesis Output
No conversion
Source Filtering Output
AnalysisSource
Conversion
diff
AnalysisSource
Source Filtering Output
AnalysisSource
AnalysisTarget
diff
DTW
/ 24
Experiment 3: Statistical Conversion
• To reveal the influences of below components
• Diffspec method: MLSA or SP-WORLD
• Sequence features [Toda+, 2007]
• Dynamic features and global variances
• 1 ms period / 24-order of mcep
22
Source Filtering Output
AnalysisSource
Conversion
diff
AnalysisSource
/ 24
Experiment 3: Statistical Conversion
• Diffspec method: MLSA ≈ SP-WORLD



• Sequence features:

static < static+dynamic < static+dynamic+GV
23
MLSA
SP-WORLD
MLSA
SP-WORLD
Preference score
0% 25% 50% 75% 100%
S+D+GV
S+D+GV
S+D
S+D
S+D
S+D
S
S p < 10–18
p < 10–11
p < 10–10
p < 10–4
MLSA
SP-WORLD
MLSA
SP-WORLD
Preference score
0% 25% 50% 75% 100%
S+D+GV
S+D+GV
S+D
S+D
S+D
S+D
S
S p < 10–71
p < 10–42
p < 0.01
p < 0.01
speaker similarity naturalness
Preference score
0% 25% 50% 75% 100%
SP-WORLDMLSA p < 0.5
Preference score
0% 25% 50% 75% 100%
SP-WORLDMLSA p < 0.5
speaker similarity naturalness
/ 24
Conclusion
• In GMM-based statistical voice conversion,
• Dynamic features and GV: definitely effective
• SP-WORLD: comparable to MLSA
• Superior in ideal conversion
• Higher order of features: not always effective
• High time-resolution analysis is effective in analysis-synthesis
• Potential of effectiveness also in conversion
Future Works
• F0 conversion
• Break the 1 ms barrier in WORLD analysis
• Other mapping models such as neural networks
24

More Related Content

Similar to APSIPA2018: A revisit to feature handling for high-quality voice conversion

AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012
OptiModel
 
sawsan-mohamed-amer-cairo-university-egypt.pptx
sawsan-mohamed-amer-cairo-university-egypt.pptxsawsan-mohamed-amer-cairo-university-egypt.pptx
sawsan-mohamed-amer-cairo-university-egypt.pptx
EhabAdel29
 
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Lionel Briand
 
When a FILTER makes the di fference in continuously answering SPARQL queries ...
When a FILTER makes the difference in continuously answering SPARQL queries ...When a FILTER makes the difference in continuously answering SPARQL queries ...
When a FILTER makes the di fference in continuously answering SPARQL queries ...
Shima Zahmatkesh
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
Davide Nardone
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Lionel Briand
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
Dmitry Grapov
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
Thomas Bagley
 
Thesis Defense SI With S&T template 2.25.2016
Thesis Defense SI With S&T template 2.25.2016Thesis Defense SI With S&T template 2.25.2016
Thesis Defense SI With S&T template 2.25.2016
Michael Shixuan Meng
 
FINAL-THESIS-MENG-02-29-2016
FINAL-THESIS-MENG-02-29-2016FINAL-THESIS-MENG-02-29-2016
FINAL-THESIS-MENG-02-29-2016
Michael Shixuan Meng
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Bibhuti Prasad Nanda
 
MiL Testing of Highly Configurable Continuous Controllers
MiL Testing of Highly Configurable Continuous ControllersMiL Testing of Highly Configurable Continuous Controllers
MiL Testing of Highly Configurable Continuous Controllers
Lionel Briand
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_Ali
MDO_Lab
 
Sparsenet
SparsenetSparsenet
Sparsenet
ndronen
 
The Analytical Method Transfer Process SK-Sep 2013
The Analytical Method Transfer Process  SK-Sep 2013The Analytical Method Transfer Process  SK-Sep 2013
The Analytical Method Transfer Process SK-Sep 2013
Stephan O. Krause, PhD
 
Attribute MSA
Attribute MSA Attribute MSA
Attribute MSA
dishashah4993
 
Attribute MSA
Attribute MSAAttribute MSA
Attribute MSA
dishashah4993
 
4 26 2013 1 IME 674 Quality Assurance Reliability EXAM TERM PROJECT INFO...
4 26 2013 1 IME 674  Quality Assurance   Reliability EXAM   TERM PROJECT INFO...4 26 2013 1 IME 674  Quality Assurance   Reliability EXAM   TERM PROJECT INFO...
4 26 2013 1 IME 674 Quality Assurance Reliability EXAM TERM PROJECT INFO...
Robin Beregovska
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...
NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...
NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...
Modelon
 

Similar to APSIPA2018: A revisit to feature handling for high-quality voice conversion (20)

AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012
 
sawsan-mohamed-amer-cairo-university-egypt.pptx
sawsan-mohamed-amer-cairo-university-egypt.pptxsawsan-mohamed-amer-cairo-university-egypt.pptx
sawsan-mohamed-amer-cairo-university-egypt.pptx
 
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
 
When a FILTER makes the di fference in continuously answering SPARQL queries ...
When a FILTER makes the difference in continuously answering SPARQL queries ...When a FILTER makes the difference in continuously answering SPARQL queries ...
When a FILTER makes the di fference in continuously answering SPARQL queries ...
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
Thesis Defense SI With S&T template 2.25.2016
Thesis Defense SI With S&T template 2.25.2016Thesis Defense SI With S&T template 2.25.2016
Thesis Defense SI With S&T template 2.25.2016
 
FINAL-THESIS-MENG-02-29-2016
FINAL-THESIS-MENG-02-29-2016FINAL-THESIS-MENG-02-29-2016
FINAL-THESIS-MENG-02-29-2016
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
 
MiL Testing of Highly Configurable Continuous Controllers
MiL Testing of Highly Configurable Continuous ControllersMiL Testing of Highly Configurable Continuous Controllers
MiL Testing of Highly Configurable Continuous Controllers
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_Ali
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
The Analytical Method Transfer Process SK-Sep 2013
The Analytical Method Transfer Process  SK-Sep 2013The Analytical Method Transfer Process  SK-Sep 2013
The Analytical Method Transfer Process SK-Sep 2013
 
Attribute MSA
Attribute MSA Attribute MSA
Attribute MSA
 
Attribute MSA
Attribute MSAAttribute MSA
Attribute MSA
 
4 26 2013 1 IME 674 Quality Assurance Reliability EXAM TERM PROJECT INFO...
4 26 2013 1 IME 674  Quality Assurance   Reliability EXAM   TERM PROJECT INFO...4 26 2013 1 IME 674  Quality Assurance   Reliability EXAM   TERM PROJECT INFO...
4 26 2013 1 IME 674 Quality Assurance Reliability EXAM TERM PROJECT INFO...
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...
NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...
NONLINEAR MODEL PREDICTIVE CONTROL FOR OPERATION OF A POST COMBUSTION ABSORPT...
 

More from Shinnosuke Takamichi

JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
Shinnosuke Takamichi
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろう
Shinnosuke Takamichi
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
Shinnosuke Takamichi
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
Shinnosuke Takamichi
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
Shinnosuke Takamichi
 
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
Shinnosuke Takamichi
 
国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告
Shinnosuke Takamichi
 
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Shinnosuke Takamichi
 
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
Shinnosuke Takamichi
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパス
Shinnosuke Takamichi
 
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
Shinnosuke Takamichi
 
音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン
Shinnosuke Takamichi
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamforming
Shinnosuke Takamichi
 
論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus
Shinnosuke Takamichi
 
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
Shinnosuke Takamichi
 
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
Shinnosuke Takamichi
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス
Shinnosuke Takamichi
 
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
Shinnosuke Takamichi
 
音声合成・変換の国際コンペティションへの 参加を振り返って
音声合成・変換の国際コンペティションへの  参加を振り返って音声合成・変換の国際コンペティションへの  参加を振り返って
音声合成・変換の国際コンペティションへの 参加を振り返って
Shinnosuke Takamichi
 
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-trackingユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
Shinnosuke Takamichi
 

More from Shinnosuke Takamichi (20)

JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろう
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
 
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
 
国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告
 
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
 
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパス
 
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
 
音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamforming
 
論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus
 
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
 
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス
 
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
 
音声合成・変換の国際コンペティションへの 参加を振り返って
音声合成・変換の国際コンペティションへの  参加を振り返って音声合成・変換の国際コンペティションへの  参加を振り返って
音声合成・変換の国際コンペティションへの 参加を振り返って
 
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-trackingユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
 

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 

APSIPA2018: A revisit to feature handling for high-quality voice conversion

  • 1. A Revisit to Feature Handling
 for High-quality Voice Conversion
 Based on Gaussian Mixture Model Hitoshi Suda, Gaku Kotani,
 Shinnosuke Takamichi, and Daisuke Saito (The University of Tokyo) APSIPA ASC 2018 / Nov. 14, 2018 No. 1370 / WE-A2-8.3
  • 2. / 24 Voice Conversion 2 Modifying personalities in the utterance Source Speaker Target Speaker
  • 3. / 24 Voice Conversion Framework • Training • Conversion 3 Source Filtering Output AnalysisSource Conversion diff AnalysisSource AnalysisSource AnalysisTarget Conversion Model DTW
 and Joint
  • 4. / 24 Voice Conversion Framework • Training • Conversion 4 Source Filtering Output AnalysisSource Conversion diff AnalysisSource AnalysisSource AnalysisTarget Conversion Model DTW
 and Joint
  • 5. / 24 Voice Conversion Framework • Training • Conversion 5 Source Filtering Output AnalysisSource Conversion diff AnalysisSource AnalysisSource AnalysisTarget Conversion Model DTW
 and Joint
  • 6. / 24 Aim of This Study 
 
 
 • To improve conversion quality
 of existing voice conversion frameworks • To experimentally reveal influences of
 feature handling • via subjective experiments 6 A Revisit to Feature Handling
 for High-quality Voice Conversion
 Based on Gaussian Mixture Model
  • 7. / 24 Experimental Setups • 50 sentences from ATR Japanese phonetically balanced sentence sets [Kurematsu+, 1990] • 40 for training, 10 for evaluation • Sampling frequency: 22050 Hz • Analysis and synthesis: WORLD [Morise+, 2016] • Speakers: 2 males and 2 females • Only intra-gender conversion / No F0 conversion • 23 listeners answered questions in each preference test via our crowdsourcing system 7
  • 8. / 24 3 Experiments 1. Analysis conditions 2. Conversion system without statistical mapping 3. Total conversion system 8 AnalysisSource Synthesis Output No conversion Source Filtering Output AnalysisSource Conversion diff AnalysisSource Source Filtering Output AnalysisSource AnalysisTarget diff DTW
  • 9. / 24 Experiment 1: Analysis Conditions • To reveal the effects of conditions of analysis • Frame periods (or frame shifts) • How precisely the waveforms are analyzed in time domain • Order of mel-cepstral coefficients (mcep) • How precisely the spectral envelopes are represented 9 AnalysisSource Synthesis Output No conversion
  • 10. / 24 Experiment 1: Analysis Conditions • Frame periods: 5 ms < 1 ms
 1 ms ≈ 500 μs
 500 μs ≤ 50 μs 10 p < 0.01 p < 0.5 Frameperiod Preference score 0% 25% 50% 75% 100% 50 µs 500 µs 1 ms 500 µs 1 ms 5 ms p < 0.1
  • 11. / 24 Experiment 1: Analysis Conditions • Order of mcep: with 1 ms analysis
 24 < 39
 39 ≈ 59
 59 < 79
 79 ≈ 99 with 50 μs analysis
 24 < 39
 39 ≈ 59
 59 ≈ 79
 79 ≈ 99 11 Orderofmel-cepstral coefficients Preference score 0% 25% 50% 75% 100% 99 79 59 39 79 59 39 24 p < 10–7 p < 1.0 p < 1.0 p < 1.0 Orderofmel-cepstral coefficients Preference score 0% 25% 50% 75% 100% 99 79 59 39 79 59 39 24 p < 10–10 p < 0.5 p < 0.01 p = 1
  • 12. / 24 Experiment 1: Analysis Conditions • Frame periods:
 5 ms < 1 ms ≈ 500 μs ≤ 50 μs • Order of mcep:
 24 < 39 ≈ 59 < 79 ≈ 99 (with 1 ms frames)
 24 < 39 ≈ 59 ≈ 79 ≈ 99 (with 50 μs frames) 12
  • 13. / 24 3 Experiments 1. Analysis conditions 2. Conversion system without statistical mapping 3. Total conversion system 13 AnalysisSource Synthesis Output No conversion Source Filtering Output AnalysisSource AnalysisTarget diff Source Filtering Output AnalysisSource Conversion diff AnalysisSource DTW
  • 14. / 24 Differential-spectrum Compensation • Famous implementation: Mel log spectrum approximation (MLSA) Filtering [Imai+, 1983] • We introduce a diffspec method “SP-WORLD”
 inspired by WORLD vocoder 14 (diffspec) [Kobayashi+, 2014] Filtering (MLSA or SP-WORLD) Output waveforms Source waveforms Difference of spectra
  • 15. / 24 Dynamic Time Warping (DTW) • Alignment of features • Sensitive to difference
 of individuality • We introduce
 “Affine-DTW” • Iteration of alignment
 and coarse conversion 15 Source speaker [s] Targetspeaker[s] 0 0.3 0.6 0.9 1.2 1.5 1.8 2.1 0 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 non-affine 1-affine 2-affine 3-affine
  • 16. / 24 Experiment 2: Ideal Conversion • To reveal the effects on quality of conversion
 of the conditions except mapping models • Diffspec method: MLSA or SP-WORLD • Analysis conditions • Frame period and order of mcep 16 Source Filtering Output AnalysisSource AnalysisTarget diff Affine-DTW
  • 17. / 24 Experiment 2: Ideal Conversion • Diffspec method: MLSA ≤ SP-WORLD
 MLSA < SP-WORLD
 MLSA < SP-WORLD
 MLSA < SP-WORLD 17 24 / 5 ms 24 / 1 ms 79 / 5 ms 79 / 1 ms Preference score 0% 25% 50% 75% 100% SP-WORLDMLSA p < 0.05 p < 0.01 p < 10–3 p < 10–4
  • 18. / 24 Experiment 2: Ideal Conversion • Order of mcep: 24 ≈ 79
 24 ≈ 79
 24 > 79
 24 > 79 18 5 ms / MLSA 1 ms / MLSA 5 ms / SP-WORLD 1 ms / SP-WORLD Preference score 0% 25% 50% 75% 100% 7924 p < 0.5 p < 1.0 p < 0.05 p < 10–3
  • 19. / 24 Experiment 2: Ideal Conversion • Frame periods: 5 ms ≈ 1 ms
 5 ms ≈ 1 ms
 5 ms ≈ 1 ms
 5 ms ≥ 1 ms 19 24 / MLSA 79 / MLSA 24 / SP-WORLD 79 / SP-WORLD Preference score 0% 25% 50% 75% 100% 1 ms5 ms p < 0.5 p < 0.5 p < 0.5 p < 0.05
  • 20. / 24 Experiment 2: Ideal Conversion • Diffspec method: MLSA < SP-WORLD • Order of mcep: 24 > 79 (with SP-WORLD)
 24 ≈ 79 (with MLSA) • Frame periods:
 5 ms ≥ 1 ms (with SP-WORLD / 79-order)
 5 ms ≈ 1 ms (with other conditions) 20
  • 21. / 24 3 Experiments 1. Analysis conditions 2. Conversion system without statistical mapping 3. Total conversion system 21 AnalysisSource Synthesis Output No conversion Source Filtering Output AnalysisSource Conversion diff AnalysisSource Source Filtering Output AnalysisSource AnalysisTarget diff DTW
  • 22. / 24 Experiment 3: Statistical Conversion • To reveal the influences of below components • Diffspec method: MLSA or SP-WORLD • Sequence features [Toda+, 2007] • Dynamic features and global variances • 1 ms period / 24-order of mcep 22 Source Filtering Output AnalysisSource Conversion diff AnalysisSource
  • 23. / 24 Experiment 3: Statistical Conversion • Diffspec method: MLSA ≈ SP-WORLD
 
 • Sequence features:
 static < static+dynamic < static+dynamic+GV 23 MLSA SP-WORLD MLSA SP-WORLD Preference score 0% 25% 50% 75% 100% S+D+GV S+D+GV S+D S+D S+D S+D S S p < 10–18 p < 10–11 p < 10–10 p < 10–4 MLSA SP-WORLD MLSA SP-WORLD Preference score 0% 25% 50% 75% 100% S+D+GV S+D+GV S+D S+D S+D S+D S S p < 10–71 p < 10–42 p < 0.01 p < 0.01 speaker similarity naturalness Preference score 0% 25% 50% 75% 100% SP-WORLDMLSA p < 0.5 Preference score 0% 25% 50% 75% 100% SP-WORLDMLSA p < 0.5 speaker similarity naturalness
  • 24. / 24 Conclusion • In GMM-based statistical voice conversion, • Dynamic features and GV: definitely effective • SP-WORLD: comparable to MLSA • Superior in ideal conversion • Higher order of features: not always effective • High time-resolution analysis is effective in analysis-synthesis • Potential of effectiveness also in conversion Future Works • F0 conversion • Break the 1 ms barrier in WORLD analysis • Other mapping models such as neural networks 24