SlideShare a Scribd company logo
1 of 12
REAL TIME VOICE CLONING
PRESENTED BY
N.GAYATHRI PRIYA (19HQ1A0528)
L.RENUKA (19HQ1A0523)
K.SAI VARSHA (19HQ1A0519)
K.UMA MAHESWAR (19HQ1A0520)
K.BHANU TEJA (19HQ1A0508)
GUIDED BY
Y.GAYATRI
CONTENTS
• Abstract
• Introduction
• Existing System
• Proposed System
• System Architecture
• Best ofVoice Cloning
• System Requirements
• Conclusion
ABSTRACT
• Deep learning models are becoming predominant in many fields of
machine learning. Text-to-Speech (TTS), the process of synthesizing
artificial speech from text, is no exception. To this end, a deep neural
network is usually trained using a corpus of several hours of recorded
speech from a single speaker. Trying to produce the voice of a speaker
other than the one learned is expensive and requires large effort since it
is necessary to record a new dataset and retrain the model. This is the
main reason why the TTS models are usually single speaker. The
proposed approach has the goal to overcome these limitations trying to
obtain a system which is able to model a multi-speaker acoustic space.
This allows the generation of speech audio similar to the voice of
different target speakers, even if they were not observed during the
training phase.
INTRODUCTION
• Text-to-Speech (TTS) synthesis, the process of generating
natural speech from text, remains a challenging task despite
decades of investigation. Nowadays there are several TTS
systems able to get impressive results in terms of synthesis
of natural voices very close to human ones.
 Unfortunately, many of these systems learn to synthesize
text only with a single voice. The goal of this work is to
build a TTS system which can generate in a data efficient
manner natural speech for a wide variety of speakers, not
necessarily seen during the training phase.
• The activity that allows the creation of this type of models is
called Voice Cloning
EXISTING SYSTEM
• As the V2C is a new task, here we briefly review several closely
related works in the fields of Text to Speech, Voice Cloning, and
Prosody Transfer. Many text-to-speech (TTS) synthesis methods
have been proposed to generate natural speech from text.
• Propose a new framework Tacotron, which integrates all the
necessary stages in text-to-speech synthesis and enables that the
speech synthesis model can be optimized in an end-to-end
manner.
• They propose a more efficient transformer (i.e., Fast Speech) by
using non auto-regressive generation method. Based on Fast
Speech, they further design an improved FastSpeech2, which
seeks to control the generated speech via the adjustment of pitch
and energy. However, the TTS task mainly focuses on how to
convert natural language text to speech in a correct pronounce.
PROPOSED SYSTEM
• The Synthesizer used is the Google tacotron 2 model which is used without
Wavenet. Tacotron is a repeated sequence to sequence system predicting a
text-based mel spectrogram.
• To build the encoder output frames, these frames are passed through a
bidirectional LSTM.
• This is where SV2TTS adds change to the architecture: the embedding of a
speaker is concatenated with each frame that the Tacotron encoder creates.
SYSTEM ARCHITICTURE
MAIN COMPONENTS
The proposed system consists of three components :
1. Speaker Encoder : Speaker encoder takes a voice note as input and then
analyzes the wave length and frequency of the referenced voice note.
2. Synthesizer : Synthesizer takes the text as input and then synthesizes the text
with the frequency of the referenced voice note.
3. Neural Vocoder : Finally the neural vocoder takes the output of the synthesizer
and then generates the speech waveform.
• Meanwhile, the synthesized voice note will be in a loop until it becomes clear
and undisturbed noise free voice note and then it proceeds to neural vocoder
then it generates speech waveform.
BEST OF VOICE CLONING
Best of voice cloning includes following three key-criteria’s :
1. Output quality : Real Time Voice Cloning provides the best output
i.e., noise free cristal clear speech from text.
2. Intuitive interface : Its easy to use the voice cloning application.
3. Voice protections : Real Time Voice Cloning application provides an
interface with many user privacy features.
SYSTEM REQUIREMENTS
Software Requirements :
✓ Windows 10 or Ubuntu 20.04+ operating
system
Hardware Requirements :
✓ 5GB+ Disk space
✓ NVIDIA GPU with at least 4GB of memory
& driver version 456.38+ (optional)
CONCLUSION
• In this work, our goal was to build a Voice
Cloning system which could generate natural
speech for a variety of target speakers in a data
efficient manner. Our system combines an
independently trained speaker encoder network
with a sequence-to-sequence with attention
architecture and a neural vocoder model.
• Using a transfer learning technique from a
speaker-discriminative encoder model based on
utterance embeddings rather than speaker
embeddings, the synthesizer and the vocoder are
able to generate good quality speech also for
speakers not observed before.
THANK YOU

More Related Content

What's hot

KERNAL ARCHITECTURE
KERNAL ARCHITECTUREKERNAL ARCHITECTURE
KERNAL ARCHITECTURElakshmipanat
 
Software engineering srs library management assignment
Software engineering srs library management assignmentSoftware engineering srs library management assignment
Software engineering srs library management assignmentRajat Mittal
 
Chapter 01 software engineering pressman
Chapter 01  software engineering pressmanChapter 01  software engineering pressman
Chapter 01 software engineering pressmanRohitGoyal183
 
Ian Sommerville, Software Engineering, 9th Edition Ch1
Ian Sommerville,  Software Engineering, 9th Edition Ch1Ian Sommerville,  Software Engineering, 9th Edition Ch1
Ian Sommerville, Software Engineering, 9th Edition Ch1Mohammed Romi
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlpZareen Syed
 
Applications of Emotions Recognition
Applications of Emotions RecognitionApplications of Emotions Recognition
Applications of Emotions RecognitionFrancesco Bonadiman
 
B8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptxB8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptxEgguIqbal
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingsaurabhnarhe
 
How to Make a Chatbot in Python | Edureka
How to Make a Chatbot in Python | EdurekaHow to Make a Chatbot in Python | Edureka
How to Make a Chatbot in Python | EdurekaEdureka!
 
Natural Language Processing using Artificial Intelligence
Natural Language Processing using Artificial IntelligenceNatural Language Processing using Artificial Intelligence
Natural Language Processing using Artificial IntelligenceAditi Rana
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learningAbu Kaisar
 
Comparing Software Quality Assurance Techniques And Activities
Comparing Software Quality Assurance Techniques And ActivitiesComparing Software Quality Assurance Techniques And Activities
Comparing Software Quality Assurance Techniques And ActivitiesLemia Algmri
 
Software Engineering Assignment
Software Engineering AssignmentSoftware Engineering Assignment
Software Engineering AssignmentSohaib Latif
 
Human Computer Interaction
Human Computer InteractionHuman Computer Interaction
Human Computer InteractionSAZZADHOSSAIN231
 
Software engineering critical systems
Software engineering   critical systemsSoftware engineering   critical systems
Software engineering critical systemsDr. Loganathan R
 

What's hot (20)

KERNAL ARCHITECTURE
KERNAL ARCHITECTUREKERNAL ARCHITECTURE
KERNAL ARCHITECTURE
 
Software engineering srs library management assignment
Software engineering srs library management assignmentSoftware engineering srs library management assignment
Software engineering srs library management assignment
 
Chapter 01 software engineering pressman
Chapter 01  software engineering pressmanChapter 01  software engineering pressman
Chapter 01 software engineering pressman
 
Ian Sommerville, Software Engineering, 9th Edition Ch1
Ian Sommerville,  Software Engineering, 9th Edition Ch1Ian Sommerville,  Software Engineering, 9th Edition Ch1
Ian Sommerville, Software Engineering, 9th Edition Ch1
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Formal verification
Formal verificationFormal verification
Formal verification
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlp
 
Applications of Emotions Recognition
Applications of Emotions RecognitionApplications of Emotions Recognition
Applications of Emotions Recognition
 
B8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptxB8_Mini project_Final review ppt.pptx
B8_Mini project_Final review ppt.pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
How to Make a Chatbot in Python | Edureka
How to Make a Chatbot in Python | EdurekaHow to Make a Chatbot in Python | Edureka
How to Make a Chatbot in Python | Edureka
 
Natural Language Processing using Artificial Intelligence
Natural Language Processing using Artificial IntelligenceNatural Language Processing using Artificial Intelligence
Natural Language Processing using Artificial Intelligence
 
AI at the Edge
AI at the EdgeAI at the Edge
AI at the Edge
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learning
 
Comparing Software Quality Assurance Techniques And Activities
Comparing Software Quality Assurance Techniques And ActivitiesComparing Software Quality Assurance Techniques And Activities
Comparing Software Quality Assurance Techniques And Activities
 
Software Engineering Assignment
Software Engineering AssignmentSoftware Engineering Assignment
Software Engineering Assignment
 
Human Computer Interaction
Human Computer InteractionHuman Computer Interaction
Human Computer Interaction
 
Software engineering critical systems
Software engineering   critical systemsSoftware engineering   critical systems
Software engineering critical systems
 

Similar to final ppt BATCH 3.pptx

Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisProvectus
 
Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...
Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...
Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...IRJET Journal
 
Your Voice is My Passport
Your Voice is My PassportYour Voice is My Passport
Your Voice is My PassportPriyanka Aash
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismIRJET Journal
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti1
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
 
Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaokeRenyuan Lyu
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing SystemIRJET Journal
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translationStephen Peacock
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labIRJET Journal
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systemsSandeep Kumar
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...linshanleearchive
 

Similar to final ppt BATCH 3.pptx (20)

Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesis
 
Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...
Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...
Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...
 
VOICE BROWSER
VOICE BROWSERVOICE BROWSER
VOICE BROWSER
 
VOICE BROWSER
VOICE BROWSERVOICE BROWSER
VOICE BROWSER
 
Your Voice is My Passport
Your Voice is My PassportYour Voice is My Passport
Your Voice is My Passport
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualism
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptx
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaoke
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systems
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
 

Recently uploaded

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 

Recently uploaded (20)

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 

final ppt BATCH 3.pptx

  • 1. REAL TIME VOICE CLONING PRESENTED BY N.GAYATHRI PRIYA (19HQ1A0528) L.RENUKA (19HQ1A0523) K.SAI VARSHA (19HQ1A0519) K.UMA MAHESWAR (19HQ1A0520) K.BHANU TEJA (19HQ1A0508) GUIDED BY Y.GAYATRI
  • 2. CONTENTS • Abstract • Introduction • Existing System • Proposed System • System Architecture • Best ofVoice Cloning • System Requirements • Conclusion
  • 3. ABSTRACT • Deep learning models are becoming predominant in many fields of machine learning. Text-to-Speech (TTS), the process of synthesizing artificial speech from text, is no exception. To this end, a deep neural network is usually trained using a corpus of several hours of recorded speech from a single speaker. Trying to produce the voice of a speaker other than the one learned is expensive and requires large effort since it is necessary to record a new dataset and retrain the model. This is the main reason why the TTS models are usually single speaker. The proposed approach has the goal to overcome these limitations trying to obtain a system which is able to model a multi-speaker acoustic space. This allows the generation of speech audio similar to the voice of different target speakers, even if they were not observed during the training phase.
  • 4. INTRODUCTION • Text-to-Speech (TTS) synthesis, the process of generating natural speech from text, remains a challenging task despite decades of investigation. Nowadays there are several TTS systems able to get impressive results in terms of synthesis of natural voices very close to human ones.  Unfortunately, many of these systems learn to synthesize text only with a single voice. The goal of this work is to build a TTS system which can generate in a data efficient manner natural speech for a wide variety of speakers, not necessarily seen during the training phase. • The activity that allows the creation of this type of models is called Voice Cloning
  • 5. EXISTING SYSTEM • As the V2C is a new task, here we briefly review several closely related works in the fields of Text to Speech, Voice Cloning, and Prosody Transfer. Many text-to-speech (TTS) synthesis methods have been proposed to generate natural speech from text. • Propose a new framework Tacotron, which integrates all the necessary stages in text-to-speech synthesis and enables that the speech synthesis model can be optimized in an end-to-end manner. • They propose a more efficient transformer (i.e., Fast Speech) by using non auto-regressive generation method. Based on Fast Speech, they further design an improved FastSpeech2, which seeks to control the generated speech via the adjustment of pitch and energy. However, the TTS task mainly focuses on how to convert natural language text to speech in a correct pronounce.
  • 6. PROPOSED SYSTEM • The Synthesizer used is the Google tacotron 2 model which is used without Wavenet. Tacotron is a repeated sequence to sequence system predicting a text-based mel spectrogram. • To build the encoder output frames, these frames are passed through a bidirectional LSTM. • This is where SV2TTS adds change to the architecture: the embedding of a speaker is concatenated with each frame that the Tacotron encoder creates.
  • 8. MAIN COMPONENTS The proposed system consists of three components : 1. Speaker Encoder : Speaker encoder takes a voice note as input and then analyzes the wave length and frequency of the referenced voice note. 2. Synthesizer : Synthesizer takes the text as input and then synthesizes the text with the frequency of the referenced voice note. 3. Neural Vocoder : Finally the neural vocoder takes the output of the synthesizer and then generates the speech waveform. • Meanwhile, the synthesized voice note will be in a loop until it becomes clear and undisturbed noise free voice note and then it proceeds to neural vocoder then it generates speech waveform.
  • 9. BEST OF VOICE CLONING Best of voice cloning includes following three key-criteria’s : 1. Output quality : Real Time Voice Cloning provides the best output i.e., noise free cristal clear speech from text. 2. Intuitive interface : Its easy to use the voice cloning application. 3. Voice protections : Real Time Voice Cloning application provides an interface with many user privacy features.
  • 10. SYSTEM REQUIREMENTS Software Requirements : ✓ Windows 10 or Ubuntu 20.04+ operating system Hardware Requirements : ✓ 5GB+ Disk space ✓ NVIDIA GPU with at least 4GB of memory & driver version 456.38+ (optional)
  • 11. CONCLUSION • In this work, our goal was to build a Voice Cloning system which could generate natural speech for a variety of target speakers in a data efficient manner. Our system combines an independently trained speaker encoder network with a sequence-to-sequence with attention architecture and a neural vocoder model. • Using a transfer learning technique from a speaker-discriminative encoder model based on utterance embeddings rather than speaker embeddings, the synthesizer and the vocoder are able to generate good quality speech also for speakers not observed before.