The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
The document describes the NAIST Text-to-Speech system developed for the Blizzard Challenge 2015. The system uses an HMM-based approach with 4 main modules: text processing, speech processing, training, and synthesis. New functions include parameter trajectory smoothing using modulation spectrum analysis in the speech processing module and incorporating modulation spectrum in the synthesis module. Evaluation results show the system ranked highly in naturalness and intelligibility for the Marathi language.
The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
The document describes the NAIST Text-to-Speech system developed for the Blizzard Challenge 2015. The system uses an HMM-based approach with 4 main modules: text processing, speech processing, training, and synthesis. New functions include parameter trajectory smoothing using modulation spectrum analysis in the speech processing module and incorporating modulation spectrum in the synthesis module. Evaluation results show the system ranked highly in naturalness and intelligibility for the Marathi language.
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputShinnosuke Takamichi
In many situation such as TV narration & speech-based creativity, you may wanna control the prosody or pronunciation of synthetic speech. This method allows us to control synthetic speech using your voice.
1) The document proposes a training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. It trains acoustic models through an iterative process of updating the models and anti-spoofing discriminator.
2) The algorithm aims to improve speech quality by compensating for differences between natural and generated speech parameter distributions using adversarial training.
3) Evaluation results show the algorithm improves speech quality over conventional training, while also training the models to effectively deceive the anti-spoofing system. The quality gains are robust against hyperparameter settings.
The document discusses non-academic career paths for PhDs. It outlines the types of non-academic careers including private sector jobs, government/non-profit roles, education, and social entrepreneurship. Motivations for pursuing non-academic careers include financial issues from low post-doc salaries and lack of funding, as well as emotional reasons. The process of pursuing a non-academic career involves networking, both officially through universities and job postings, and unofficially through personal connections, to connect science and society.