SlideShare a Scribd company logo
INTroduction
1. INTroduction
DDS: A new device-degraded speech dataset for speech enhancement
Haoyu Li1,2, Junichi Yamagishi1,2
1National Institute of Informatics, Japan 2SOKENDAI, Japan
INTroduction
Conclusion
Introduction
(1) Noisy speech dataset used for training neural SE model
(2) Many existing dataset generated by simulation
(3) Model trained on simulated dataset degrades on real recordings
a) Real recordings are degraded by multiple joint factors
b) RIRs cannot capture the nonlinear distortion
(1) Collect real noisy recordings by consumer-grade devices under
uncontrolled environments (e.g., using iPhone at home)
(2) Release such a parallel noisy speech dataset to facilitate research in
speech enhancement
Overview of DDS Initial analysis
Goals
Collection Setup
Wed-P-OS-6-1
441
INTERSPEECH 2022
Dataset access
Links
We release a large-scale device-degraded speech (DDS) dataset with 2,000 hours of real
recordings collected under 27 conditions spanning 9 realistic rooms and 3 devices
Average PESQ and ESTOI on different conditions
Baseline results on test set
 D1: 32 hours speech extracted from matched room and
matched mic (closed-set)
 D2: 32 hours speech extracted from one unmatched room
and one unmatched device
 D3: 32 hours speech extracted from one unmatched room
and two unmatched device
 D4: 32 hours speech extracted from four unmatched room
and two unmatched device
 D5: 32 hours speech extracted from eight unmatched room
and two unmatched device
Nealy 2,000 hours of speech data collected in 27 conditions:
 2 speech materials: DAPS (4 hours) and VCTK subset (8 hours)
 9 realistic rooms
 3 microphone devices
 6 recording positions to collect speech at various noise and
reverberation levels
(1) Play clean speech recordings using high-quality monitor speaker
and re-record waveforms on various devices and environments
(2) Perform cross-correlation to align the device-degraded speech
and original clean speech
Overall settings

More Related Content

Similar to DDS: A new device-degraded speech dataset for speech enhancement

Ig2 task 1 work sheet s
Ig2 task 1 work sheet sIg2 task 1 work sheet s
Ig2 task 1 work sheet s
Shaz Riches
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
Colinjason Dockerty
 
Multi-modal Asian Conversation Mobile Video Dataset for Recognition Task
Multi-modal Asian Conversation Mobile Video Dataset for Recognition TaskMulti-modal Asian Conversation Mobile Video Dataset for Recognition Task
Multi-modal Asian Conversation Mobile Video Dataset for Recognition Task
IJECEIAES
 
myukm
myukmmyukm
myukm
Joshgrey16
 
1019上課資料
1019上課資料1019上課資料
1019上課資料
sean0829
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
copelandadam
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
copelandadam
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
copelandadam
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
Stephen Marquard
 
IG1 Task 1
IG1 Task 1IG1 Task 1
IG1 Task 1
Stuart_Preston
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
PhillipWynne12281991
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Course
sipij
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preivious
PhillipWynne12281991
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
Jordanianmc
 
Coordinated Adaptive Processing.pdf
Coordinated Adaptive Processing.pdfCoordinated Adaptive Processing.pdf
Coordinated Adaptive Processing.pdf
MaiGaber4
 
Richard_Final_Poster
Richard_Final_PosterRichard_Final_Poster
Richard_Final_Poster
Richard Jung
 
IRJET- Linux Based Speaking Medication Reminder System
IRJET- Linux Based Speaking Medication Reminder SystemIRJET- Linux Based Speaking Medication Reminder System
IRJET- Linux Based Speaking Medication Reminder System
IRJET Journal
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptx
Mounika715343
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
JoshCollege
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
PhillipWynne12281991
 

Similar to DDS: A new device-degraded speech dataset for speech enhancement (20)

Ig2 task 1 work sheet s
Ig2 task 1 work sheet sIg2 task 1 work sheet s
Ig2 task 1 work sheet s
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Multi-modal Asian Conversation Mobile Video Dataset for Recognition Task
Multi-modal Asian Conversation Mobile Video Dataset for Recognition TaskMulti-modal Asian Conversation Mobile Video Dataset for Recognition Task
Multi-modal Asian Conversation Mobile Video Dataset for Recognition Task
 
myukm
myukmmyukm
myukm
 
1019上課資料
1019上課資料1019上課資料
1019上課資料
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
IG1 Task 1
IG1 Task 1IG1 Task 1
IG1 Task 1
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Course
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preivious
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Coordinated Adaptive Processing.pdf
Coordinated Adaptive Processing.pdfCoordinated Adaptive Processing.pdf
Coordinated Adaptive Processing.pdf
 
Richard_Final_Poster
Richard_Final_PosterRichard_Final_Poster
Richard_Final_Poster
 
IRJET- Linux Based Speaking Medication Reminder System
IRJET- Linux Based Speaking Medication Reminder SystemIRJET- Linux Based Speaking Medication Reminder System
IRJET- Linux Based Speaking Medication Reminder System
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptx
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 

More from Yamagishi Laboratory, National Institute of Informatics, Japan

The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Generalization Ability of MOS Prediction Networks
Generalization Ability of MOS Prediction NetworksGeneralization Ability of MOS Prediction Networks
Generalization Ability of MOS Prediction Networks
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Estimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasureEstimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasure
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
How do Voices from Past Speech Synthesis Challenges Compare Today?
 How do Voices from Past Speech Synthesis Challenges Compare Today? How do Voices from Past Speech Synthesis Challenges Compare Today?
How do Voices from Past Speech Synthesis Challenges Compare Today?
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio SynthesisText-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Advancements in Neural Vocoders
Advancements in Neural VocodersAdvancements in Neural Vocoders
Neural Waveform Modeling
Neural Waveform Modeling Neural Waveform Modeling
Neural source-filter waveform model
Neural source-filter waveform modelNeural source-filter waveform model
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Yamagishi Laboratory, National Institute of Informatics, Japan
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
Yamagishi Laboratory, National Institute of Informatics, Japan
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
Yamagishi Laboratory, National Institute of Informatics, Japan
 

More from Yamagishi Laboratory, National Institute of Informatics, Japan (17)

The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
 
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
 
Generalization Ability of MOS Prediction Networks
Generalization Ability of MOS Prediction NetworksGeneralization Ability of MOS Prediction Networks
Generalization Ability of MOS Prediction Networks
 
Estimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasureEstimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasure
 
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
Attention Back-end for Automatic Speaker Verification with Multiple Enrollmen...
 
How do Voices from Past Speech Synthesis Challenges Compare Today?
 How do Voices from Past Speech Synthesis Challenges Compare Today? How do Voices from Past Speech Synthesis Challenges Compare Today?
How do Voices from Past Speech Synthesis Challenges Compare Today?
 
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio SynthesisText-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
 
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
 
Advancements in Neural Vocoders
Advancements in Neural VocodersAdvancements in Neural Vocoders
Advancements in Neural Vocoders
 
Neural Waveform Modeling
Neural Waveform Modeling Neural Waveform Modeling
Neural Waveform Modeling
 
Neural source-filter waveform model
Neural source-filter waveform modelNeural source-filter waveform model
Neural source-filter waveform model
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
 
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
 

Recently uploaded

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 

Recently uploaded (20)

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 

DDS: A new device-degraded speech dataset for speech enhancement

  • 1. INTroduction 1. INTroduction DDS: A new device-degraded speech dataset for speech enhancement Haoyu Li1,2, Junichi Yamagishi1,2 1National Institute of Informatics, Japan 2SOKENDAI, Japan INTroduction Conclusion Introduction (1) Noisy speech dataset used for training neural SE model (2) Many existing dataset generated by simulation (3) Model trained on simulated dataset degrades on real recordings a) Real recordings are degraded by multiple joint factors b) RIRs cannot capture the nonlinear distortion (1) Collect real noisy recordings by consumer-grade devices under uncontrolled environments (e.g., using iPhone at home) (2) Release such a parallel noisy speech dataset to facilitate research in speech enhancement Overview of DDS Initial analysis Goals Collection Setup Wed-P-OS-6-1 441 INTERSPEECH 2022 Dataset access Links We release a large-scale device-degraded speech (DDS) dataset with 2,000 hours of real recordings collected under 27 conditions spanning 9 realistic rooms and 3 devices Average PESQ and ESTOI on different conditions Baseline results on test set  D1: 32 hours speech extracted from matched room and matched mic (closed-set)  D2: 32 hours speech extracted from one unmatched room and one unmatched device  D3: 32 hours speech extracted from one unmatched room and two unmatched device  D4: 32 hours speech extracted from four unmatched room and two unmatched device  D5: 32 hours speech extracted from eight unmatched room and two unmatched device Nealy 2,000 hours of speech data collected in 27 conditions:  2 speech materials: DAPS (4 hours) and VCTK subset (8 hours)  9 realistic rooms  3 microphone devices  6 recording positions to collect speech at various noise and reverberation levels (1) Play clean speech recordings using high-quality monitor speaker and re-record waveforms on various devices and environments (2) Perform cross-correlation to align the device-degraded speech and original clean speech Overall settings

Editor's Notes

  1. Our initial study on… Speech is the main media for humans to communicate. But the quality and intelligibility of speech is degraded due to the noise. Speech enhancement is a technique to compensate for such degradation. Consider a real-world scenario, as plotted in this figure. Noise may exist in not only speaker but also listener environments. In speaker environment, the microphone receives a mixture signal of noise and speaker’s voice. So our goal is to suppress noise. This task is referred to noise reduction While in listener environment, noise is physically present around the listener. So what we can do here is to modify the far-end transmitted speech signal to make it more intelligible. This task is referred to listening enhancement. In this study, we jointly consider these two sub-tasks, meaning that we reduce noise in far-end side and improve … This is the proposed model. Here s