SlideShare a Scribd company logo
1 of 34
Investigating self-supervised front ends
for speech spoofing countermeasures
Xin Wang, Junichi Yamagishi
National Institute of Informatics
ISCA Speaker Odyssey 2022
wangxin@nii.ac.jp
2
Introduction
3
 Anti-spoofing as binary classification
 protect automatic speaker verification (ASV)
 detect fake-voice-based phone scam
Introduction
https://www.forbes.com/sites/thomasbrewster/2021/10/14/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions/?sh=88e208875591
Speech
synthesizer
Countermeasure (CM)
Spoofed
Bona fide
4
 CM development flow
 Corpus w/ data protocol:
• ASVspoof (Wu 2015, Todisco 2019), BTAS (Korshunov 2016), FMFCC-A (Zhang 2021),
ADD (Yi 2022)
Introduction
Korshunov, P. et al. Overview of BTAS 2016 speaker anti-spoofing competition. in Proc. BTAS 1–6 (2016). doi:10.1109/BTAS.2016.7791200
Zhang, Z., Gu, Y., Yi, X. & Zhao, X. FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection. arXiv Prepr. arXiv2110.09441 (2021)
Wu, Z. et al. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. in Proc. Interspeech 2037–2041 (2015). doi:10.21437/Interspeech.2015-462
Todisco, M. et al. ASVspoof 2019: future horizons in spoofed and fake audio detection. in Proc. Interspeech 1008–1012 (2019). doi:10.21437/Interspeech.2019-2249
Yi, J. et al. Add 2022: the first audio deep synthesis detection challenge. in Proc. ICASSP 9216–9220 (2022).
CM
Training set Test set
Anti-spoofing corpus
Unseen speakers
Unseen spoofing methods
5
 CM development flow
Introduction
CM
Training set Test set
Space of all possible bona
fide and spoofed data
En, Fr, Zh, Jp, …
Speaker A, B, C, …
wav, mp3, m4a, ogg …
New spoofing methods
Anti-spoofing corpus
6
 Questions (Paul 2017, Das 2020, Müller 2022)
 Does CM generalize well: no
 How to improve?
Introduction
* From CQSPIC system in Table 1&2 of (Das 2020)
Paul, D., Sahidullah, M. & Saha, G. Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. in Proc. ICASSP 2047–2051 (2017). doi:10.1109/ICASSP.2017.7952516
Das, R. K., Yang, J. & Li, H. Assessing the scope of generalized countermeasures for anti-spoofing. in Proc. ICASSP 6589–6593 (2020). doi:10.1109/ICASSP40776.2020.9053086
Nicolas M Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, and Konstantin Böttinger. Does Audio Deepfake Detection Generalize? ArXiv Preprint ArXiv:2203.16263. 2022.
CM
Training set Test set 1 Test set 2
ASVspoof
2015 test set
EER <8% EER >20%*
ASVspoof 2019
More
training
data?
7
Methods
8
CM
 General idea
 Feature extraction using a pre-trained self-supervised learing (SSL)
speech model
Method
Training set of corpus 1
Test set of
corpus 1
Test set of
corpus 2
Test set of
corpus 3
SSL
Huge amount of
bona fide data
9
Method
DSP: digital signal processing
T: number of waveform sampling points, N: number of frames, D: number of dimensions in each frame
Sahidullah, M., Kinnunen, T. & Hanilçi, C. A comparison of features for synthetic speech detection. in Proc. Interspeech 2087–2091 (2015).
 CM structure - baseline
 Front end: linear frequency cepstrum coefficients (LFCCs)(Sahidullah 2015)
CM
DSP-based
front end
Back end
Input wav.
Spoofed
Bona fide
10
Method
D and N are decided by the CNN configuration
Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. in Proc. NIPS vol. 33 12449–12460 (2020).
 CM structure - investigated
CM
Pre-trained
SSL-based
front end
Back end
Input wav.
Spoofed
Bona fide
For example, wav2vec 2.0 (Baevski 2020)
*
11
Method
Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. in Proc. NIPS vol. 33 12449–12460 (2020).
 CM structure - investigated
CM
Pre-trained
SSL-based
front end
Back end
Input wav.
Spoofed
Bona fide
For example, wav2vec 2.0 (Baevski 2020)
 pre-trained using a self-contrastive loss and a huge
amount of bona fide waveforms
 Languages, speakers, background noises, reverberation, codec
…
 many pre-trained SSL models available
12
Method
Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. in Proc. NIPS vol. 33 12449–12460 (2020).
 CM structure - investigated
 Should is be fine-tuned?
 How to design back end?
• DNN, or a single output layer
CM
Pre-trained
SSL-based
front end
Back end
Input wav.
Spoofed
Bona fide
13
Experiments
14
CM
 Data & protocol
Experiment
LA: logical access, DF: deepfake
Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald, and others. VCTK Corpus: English Multi-Speaker Corpus for Cstr Voice Cloning Toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR). 2016.
Gautham J. Mysore, “Can we automatically transform speech recorded on common consumer devices in realworld environments into professional production quality speech? - A dataset, insights, and challenges,” IEEE Signal Process. Lett.,
vol. 22, no. 8, pp. 1006–1010, 2015.
M. Wester, “The EMIME bilingual database,” The University of Edinburgh, Tech. Rep., 2010.
ASVspoof 2019
LA training set
ASVspoof2019
LA test set
ASVspoof2015
test set
ASVspoof2021
LA
VCTK
VCTK
VCTK
ASVspoof2021
Deepfake
eval set
DAPS
EMIME
Clean
En
En
En
En
En, Fi,
De, Zh
Clean
Codec, transmission
Compression
Spoofing methods are different!
15
Global average pooling
Light CNN
Fully connected
 CM configuration – back end
 Baseline: LCNN-variant (LLGF) (Wang 2021)
• LCNN + LSTM + GAP + FC
 Experimental models
• Which type of back end?
– LLGF: LCNN + LSTM + GAP + FC
– LGF: LSTM + GAP + FC
– GF: GAP + FC
Experiment
Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. A Light CNN for Deep Face Representation with Noisy Labels. IEEE Transactions on Information Forensics and Security 13 (11). IEEE:
2884–2896. doi:10.1109/TIFS.2018.2833032. 2018.
Wang, X. & Yamagishi, J. A comparative study on recent neural spoofing countermeasures for synthetic speech detection. in Proc. Interspeech 4259–4263 (2021).
doi:10.21437/Interspeech.2021-702
Baseline Experimental models
Back
end
Front
end
16
 CM configuration – front end
 Baseline: LFCC
• Configuration from ASVspoof 2019*
 Experimental models
• SSL front end: Wav2vec 2.0 XLSR
• Should we fine-tune the front end on the
ASVspoof 2019 training set?
– If yes:
1. Initialize front end using pre-trained SSL
2. Train the whole CM w/ a small learning rate
– If no:
1. Initialize front end using pre-trained SSL
2. Fix the front end, train the back end
Experiment
* https://www.asvspoof.org/asvspoof2019/ASVspoof_2019_baseline_CM_v1.zip
Baseline Experimental models
Back
end
Front
end
17
Results in EER (%)
See full results in the paper
See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725
ASVspoof 2019 LA
ASVspoof 2015 LA
ASVspoof 2021 LA eval
ASVspoof 2021 DF eval
Baseline EERs on
SSL (fixed) SSL (fine-tuned)
18
Results in EER (%)
See full results in the paper
See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725
2019 LA
2015 LA
2021 LA eval
2021 DF eval
Baseline:
• low EER on ASVspoof 2019 LA test set
• high EER on ASVspoof 2015, 2021 !
19
Results in EER (%)
See full results in the paper
See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725
2019 LA
2015 LA
2021 LA eval
2021 DF eval
Using a pre-trained SSL
front end is good
Fine-tune pre-trained SSL
front end is better
20
Results in EER (%)
See full results in the paper
See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725
2019 LA
2015 LA
2021 LA eval
2021 DF eval
Should we fine tune the SSL front end?
• Yes
Which type of back end is better?
• If we fine-tune the front end, difference is small
21
Summary
22
Summary
Zhang, Z., Gu, Y., Yi, X. & Zhao, X. FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection. arXiv Prepr. arXiv2110.09441 (2021)
Frank, J. & Schönherr, L. WaveFake: A Data Set to Facilitate Audio DeepFake Detection. in Proc. NeurIPS Datasets and Benchmarks 2021 accepted (2021).
 Do SSL front ends generalize better than baseline?
 Yes
 Recommendations from this study
Large SSL model pre-trained on diverse data
Fine-tuning the SSL front end
 However, the best model is not sufficiently generaliable
 https://arxiv.org/pdf/2203.14553.pdf
 EER > 30% on FMFCC-A Mandarin data (Zhang 2021)
 EER > 15% on WaveFake English/Japanese data(Fank 2021)
23
Summary
It is comparable to XLS-53 without data augmentation in (Juan M Martin-Donas 2022)
Juan M Martin-Donas, and Aitor Álvarez. The Vicomtech Audio Deepfake Detection System Based on Wav2vec2 for the 2022 ADD Challenge. In Proc. ICASSP, 9241–9245. 2022.
Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, and Nicholas Evans. Automatic Speaker Verification Spoofing and Deepfake Detection Using
Wav2vec 2.0 and Data Augmentation. In Proc. Odyssey 2022 The Speaker and Language Recognition Workshop. Vol. accepted. 2022.
 Other findinds in the paper & appendix
 Why SSL outperformed DSP front end?
• It ignores spurious “evidence” in high-frequency band of training data
 Which pre-trained SSL model is better?
 Detailed EERs on each codec, spoofing methods …
 Findings from related works
 Data augmentation to CM is useful (Martin-Donas 2022, Tak 2022)
 Back-end can be further improved (Tak 2022)
24
Thank you
https://github.com/nii-yamagishilab/project-NN-Pytorch-
scripts#26-speech-anti-spoofing-with-ssl-front-end
26
Appendix
27
 CM configuration – front end
 Baseline: LFCC
• Configuration from ASVspoof 2019*
• 60 dim. per frame (s, △, △2)
 Experimental models
• Which SSL model?
• Should we fine-tune the front end on the
ASVspoof 2019 training set?
– If yes:
1. Initialize front end using pre-trained SSL
2. Train the whole CM w/ a small learning rate
– If no:
1. Initialize front end using pre-trained SSL
2. Fix the front end, train the back end only
Experiment
* https://www.asvspoof.org/asvspoof2019/ASVspoof_2019_baseline_CM_v1.zip
Baseline Experimental models
Back
end
Front
end
28
Results in EER (%)
See full results in the paper
See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725
2019 LA
2015 LA
2021 LA eval
2021 DF eval
SSL model size Large Extra Large Large Large Small
Multi-lingual? Yes (>50) En Yes (>50) En En
SSL data (hour) ~55k >60k En >61k, else 4k >60k ~1k
Which pre-trained SSL front end?
29
Results in EER (%) Low EER High EER
The three train-evaluation rounds (I, II, III)
Front end
Back end
Test
set
Not fine-tuned (pre-trained model was fixed) fine-tuned
30
CM
Feature
extraction
(front end)
Scoring
(back end)
wavform
 Anti-spoofing as binary classification

 Evaluation metrics
• equal error rate (EER)
• min-tDCF(Kinnuen 2020)
Introduction
Kinnunen, T. et al. Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals. IEEE/ACM
Trans. Audio, Speech, Lang. Process. 28, 2195–2210 (2020)
Spoofed
Bona fide
CM score s
31
ASVspoof 2021 DF
Cond. Compression (Quality) VBR settings (kbps)
C1 None -
C2 mp3 (low) ~80-120
C3 mp3 (high) ~220-260
C4 m4a (low) ~20-32
C5 m4a (high) ~96-112
C6 ogg (low) ~80-96
C7 ogg (high) ~256-320
C8 mp3 (low) => m4a (high) ~80-120=>~96-112
C9 ogg (low) => m4a (high) ~80-96=>~96-112
Compression
Methods:
Audio data:
Double
compression
Single
compression
No
compression
32
Analysis
33
4.0
LFCC
LLGF
4.1
16.0
9.0
9.8
12.0
2.0
W2V-Large2
Fixed
LLGF
2.9
28.0
15.0
1.1
1.1
3.1
W2V-XLSR
Fixed
LLGF
4.0
35.0
13.2
5.0
5.0
2
W
Fi
4
2
4
4
3
On ASVspoof 2019 L
Test set
Trained
CM
4.0
LFCC
LLGF
4.1
16.0
9.0
9.8
2.0
W2V-Large2
Fixed
LLGF
2.9
28.0
15.0
1.1
3.1
W2V-XLSR
Fixed
LLGF
4.0
35.0
13.2
5.0
2
W
Fi
4
2
4
4
On ASVspoof 2019 L
Trained
CM
 Sub-band analysis after training
 Insipred by (Tak 2020)
 Given a trained CM:
Tak, H., Patino, J., Nautsch, A., Evans, N. & Todisco, M. An Explainability Study of the Constant Q Cepstral Coefficient Spoofing
Countermeasure for Automatic Speaker Verification. in Proc. Odyssey 333–340 (2020). doi:10.21437/Odyssey.2020-47
Band-stop filter
(0.8 - 2.4 kHz)
Distribution of s
s
s
…
Analysis
Good
discrimination
Poor
discrimination
34
 Results on ASVspoof
2019 LA test subset
 Baseline replies more on
high-freq. “evidence” ?
 SSL front end is more
sensitive to low freq. band
(F1 in speech formant)
Analysis
See full results in the paper
4.0
LFCC
LLGF
4.1
16.0
9.0
9.8
12.0
46.0
CM score
34.0
2.0
W2V-Large2
Fixed
LLGF
2.9
28.0
15.0
1.1
1.1
2.0
2.0
3.1
W2V-XLSR
Fixed
LLGF
4.0
35.0
13.2
5.0
5.0
3.2
6.0
2.1
W2V-XLSR
Fine-tuned
LLGF
4.0
29.9
4.8
4.0
3.0
3.1
3.2
3.8
W2V-XLSR
Fine-tuned
GF
5.0
29.9
5.8
4.2
3.8
3.2
3.1
4.0
LFCC
LLGF
4.1
16.0
9.0
9.8
12.0
46.0
CM score
34.0
2.0
W2V-Large2
Fixed
LLGF
2.9
28.0
15.0
1.1
1.1
2.0
2.0
3.1
W2V-XLSR
Fixed
LLGF
4.0
35.0
13.2
5.0
5.0
3.2
6.0
2.1
W2V-XLSR
Fine-tuned
LLGF
4.0
29.9
4.8
4.0
3.0
3.1
3.2
3.8
W2V-XLSR
Fine-tuned
GF
5.0
29.9
5.8
4.2
3.8
3.2
3.1
4.0
LFCC
LLGF
4.1
16.0
9.0
9.8
12.0
46.0
CM score
34.0
2.0
W2V-Large2
Fixed
LLGF
2.9
28.0
15.0
1.1
1.1
2.0
2.0
3.1
W2V-XLSR
Fixed
LLGF
4.0
35.0
13.2
5.0
5.0
3.2
6.0
2.1
W2V-XLSR
Fine-tuned
LLGF
4.0
29.9
4.8
4.0
3.0
3.1
3.2
3.8
W2V-XLSR
Fine-tuned
GF
5.0
29.9
5.8
4.2
3.8
3.2
3.1
35
25.2
LFCC
LLGF
24.5
27.4
31.4
32.7
32.0
53.6
CM score
43.7
8.0
W2V-Large2
Fixed
LLGF
9.7
36.7
17.7
10.3
8.8
8.5
8.1
11.7
W2V-XLSR
Fixed
LLGF
13.8
33.2
18.5
12.9
12.5
12.1
13.4
4.5
W2V-XLSR
Fine-tuned
LLGF
5.6
33.9
13.4
5.6
4.6
4.2
4.4
5.5
W2V-XLSR
Fine-tuned
GF
6.5
34.9
19.9
6.4
5.7
5.5
5.7
25.2
LFCC
LLGF
24.5
27.4
31.4
32.7
32.0
53.6
CM score
43.7
8.0
W2V-Large2
Fixed
LLGF
9.7
36.7
17.7
10.3
8.8
8.5
8.1
11.7
W2V-XLSR
Fixed
LLGF
13.8
33.2
18.5
12.9
12.5
12.1
13.4
4.5
W2V-XLSR
Fine-tuned
LLGF
5.6
33.9
13.4
5.6
4.6
4.2
4.4
5.5
W2V-XLSR
Fine-tuned
GF
6.5
34.9
19.9
6.4
5.7
5.5
5.7
25.2
LFCC
LLGF
24.5
27.4
31.4
32.7
32.0
53.6
CM score
43.7
8.0
W2V-Large2
Fixed
LLGF
9.7
36.7
17.7
10.3
8.8
8.5
8.1
11.7
W2V-XLSR
Fixed
LLGF
13.8
33.2
18.5
12.9
12.5
12.1
13.4
4.5
W2V-XLSR
Fine-tuned
LLGF
5.6
33.9
13.4
5.6
4.6
4.2
4.4
5.5
W2V-XLSR
Fine-tuned
GF
6.5
34.9
19.9
6.4
5.7
5.5
5.7
 Results on ASVspoof
2021 DF test subset
 Baseline replies more on
high-freq. “evidence” ?
 SSL front end is more
sensitive to low freq. band
(F1 in speech formant)
 Similar patterns on other
test sets and SSL front
ends
Analysis
See full results in the paper

More Related Content

What's hot

SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionHa Phuong
 
An Introduction and Comparison of Dante, AVB and CobraNet Methodologies
An Introduction and Comparison of Dante, AVB and CobraNet MethodologiesAn Introduction and Comparison of Dante, AVB and CobraNet Methodologies
An Introduction and Comparison of Dante, AVB and CobraNet MethodologiesrAVe [PUBS]
 
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIPDeep Learning JP
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろうShinnosuke Takamichi
 
Statistical voice conversion with direct waveform modeling
Statistical voice conversion with direct waveform modelingStatistical voice conversion with direct waveform modeling
Statistical voice conversion with direct waveform modelingNU_I_TODALAB
 
Applying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsApplying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsDarian Frajberg
 
ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向Yuma Koizumi
 
Deep Fakes Detection
Deep Fakes DetectionDeep Fakes Detection
Deep Fakes DetectionYusuke Uchida
 
Interactive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionInteractive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionNU_I_TODALAB
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向Hiroshi Fukui
 
Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法NU_I_TODALAB
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisBeerenSahu
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討Shinnosuke Takamichi
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告Yuki Saito
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2harmonylab
 
Lda and it's applications
Lda and it's applicationsLda and it's applications
Lda and it's applicationsBabu Priyavrat
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法Shinnosuke Takamichi
 

What's hot (20)

SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase Transition
 
An Introduction and Comparison of Dante, AVB and CobraNet Methodologies
An Introduction and Comparison of Dante, AVB and CobraNet MethodologiesAn Introduction and Comparison of Dante, AVB and CobraNet Methodologies
An Introduction and Comparison of Dante, AVB and CobraNet Methodologies
 
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろう
 
Statistical voice conversion with direct waveform modeling
Statistical voice conversion with direct waveform modelingStatistical voice conversion with direct waveform modeling
Statistical voice conversion with direct waveform modeling
 
Applying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsApplying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labels
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向
 
Deep Fakes Detection
Deep Fakes DetectionDeep Fakes Detection
Deep Fakes Detection
 
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform mod...
 
Interactive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionInteractive voice conversion for augmented speech production
Interactive voice conversion for augmented speech production
 
20150414seminar
20150414seminar20150414seminar
20150414seminar
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
 
Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
 
Lda and it's applications
Lda and it's applicationsLda and it's applications
Lda and it's applications
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
 

Similar to Odyssey 2022: Investigating self-supervised front ends for speech spoofing countermeasures

A Frequent Pattern-based Extension of Snort for Intrusion Detection
A Frequent Pattern-based Extension of Snort for Intrusion DetectionA Frequent Pattern-based Extension of Snort for Intrusion Detection
A Frequent Pattern-based Extension of Snort for Intrusion Detectionyoucefsama
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...diannepatricia
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Improvement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming codeImprovement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming codeIJMTST Journal
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN reviewJune-Woo Kim
 
CS8611-Mini Project - PPT Template-4 (1).ppt
CS8611-Mini Project - PPT Template-4 (1).pptCS8611-Mini Project - PPT Template-4 (1).ppt
CS8611-Mini Project - PPT Template-4 (1).pptJANARTHANANP013
 
Audiometry A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...
Audiometry  A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...Audiometry  A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...
Audiometry A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...Amanda Summers
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
IEEE Information forensic and security Title and Abstract 2016
IEEE Information forensic and security Title and Abstract 2016IEEE Information forensic and security Title and Abstract 2016
IEEE Information forensic and security Title and Abstract 2016tsysglobalsolutions
 
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...Mark John Lado, MIT
 
Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016Russell Childs
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd
 
Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Russell Childs
 
Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWS
 Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWS Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWS
Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWSAWS Chicago
 
Real Time Sign Language Translation Using Tensor Flow Object Detection
Real Time Sign Language Translation Using Tensor Flow Object DetectionReal Time Sign Language Translation Using Tensor Flow Object Detection
Real Time Sign Language Translation Using Tensor Flow Object DetectionIRJET Journal
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
 
IRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired PeopleIRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired PeopleIRJET Journal
 
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...
IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...IRJET Journal
 

Similar to Odyssey 2022: Investigating self-supervised front ends for speech spoofing countermeasures (20)

A Frequent Pattern-based Extension of Snort for Intrusion Detection
A Frequent Pattern-based Extension of Snort for Intrusion DetectionA Frequent Pattern-based Extension of Snort for Intrusion Detection
A Frequent Pattern-based Extension of Snort for Intrusion Detection
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Improvement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming codeImprovement in Error Resilience in BIST using hamming code
Improvement in Error Resilience in BIST using hamming code
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
 
CS8611-Mini Project - PPT Template-4 (1).ppt
CS8611-Mini Project - PPT Template-4 (1).pptCS8611-Mini Project - PPT Template-4 (1).ppt
CS8611-Mini Project - PPT Template-4 (1).ppt
 
Audiometry A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...
Audiometry  A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...Audiometry  A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...
Audiometry A Model-View-Viewmodel (MVVM) Application Framework For Hearing I...
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
IEEE Information forensic and security Title and Abstract 2016
IEEE Information forensic and security Title and Abstract 2016IEEE Information forensic and security Title and Abstract 2016
IEEE Information forensic and security Title and Abstract 2016
 
Lichang Wang_CV
Lichang Wang_CVLichang Wang_CV
Lichang Wang_CV
 
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
A WIRELESS DIGITAL PUBLIC ADDRESS WITH VOICE ALARM AND TEXT-TO-SPEECH FEATURE...
 
Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016Full resume dr_russell_john_childs_2016
Full resume dr_russell_john_childs_2016
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
 
Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013
 
Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWS
 Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWS Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWS
Seth A. Faith - Building a PaaS for Forensic DNA analysis using AWS
 
Real Time Sign Language Translation Using Tensor Flow Object Detection
Real Time Sign Language Translation Using Tensor Flow Object DetectionReal Time Sign Language Translation Using Tensor Flow Object Detection
Real Time Sign Language Translation Using Tensor Flow Object Detection
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
 
IRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired PeopleIRJETDevice for Location Finder and Text Reader for Visually Impaired People
IRJETDevice for Location Finder and Text Reader for Visually Impaired People
 
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...
IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...IRJET-  	  Device for Location Finder and Text Reader for Visually Impaired P...
IRJET- Device for Location Finder and Text Reader for Visually Impaired P...
 

More from Yamagishi Laboratory, National Institute of Informatics, Japan

エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~Yamagishi Laboratory, National Institute of Informatics, Japan
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~Yamagishi Laboratory, National Institute of Informatics, Japan
 

More from Yamagishi Laboratory, National Institute of Informatics, Japan (16)

DDS: A new device-degraded speech dataset for speech enhancement
DDS: A new device-degraded speech dataset for speech enhancement DDS: A new device-degraded speech dataset for speech enhancement
DDS: A new device-degraded speech dataset for speech enhancement
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
 
Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...
Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...
Analyzing Language-Independent Speaker Anonymization Framework under Unseen C...
 
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials S...
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Generalization Ability of MOS Prediction Networks
Generalization Ability of MOS Prediction NetworksGeneralization Ability of MOS Prediction Networks
Generalization Ability of MOS Prediction Networks
 
Estimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasureEstimating the confidence of speech spoofing countermeasure
Estimating the confidence of speech spoofing countermeasure
 
How do Voices from Past Speech Synthesis Challenges Compare Today?
 How do Voices from Past Speech Synthesis Challenges Compare Today? How do Voices from Past Speech Synthesis Challenges Compare Today?
How do Voices from Past Speech Synthesis Challenges Compare Today?
 
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio SynthesisText-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
 
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
 
Advancements in Neural Vocoders
Advancements in Neural VocodersAdvancements in Neural Vocoders
Advancements in Neural Vocoders
 
Neural Waveform Modeling
Neural Waveform Modeling Neural Waveform Modeling
Neural Waveform Modeling
 
Neural source-filter waveform model
Neural source-filter waveform modelNeural source-filter waveform model
Neural source-filter waveform model
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Odyssey 2022: Investigating self-supervised front ends for speech spoofing countermeasures

  • 1. Investigating self-supervised front ends for speech spoofing countermeasures Xin Wang, Junichi Yamagishi National Institute of Informatics ISCA Speaker Odyssey 2022 wangxin@nii.ac.jp
  • 3. 3  Anti-spoofing as binary classification  protect automatic speaker verification (ASV)  detect fake-voice-based phone scam Introduction https://www.forbes.com/sites/thomasbrewster/2021/10/14/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions/?sh=88e208875591 Speech synthesizer Countermeasure (CM) Spoofed Bona fide
  • 4. 4  CM development flow  Corpus w/ data protocol: • ASVspoof (Wu 2015, Todisco 2019), BTAS (Korshunov 2016), FMFCC-A (Zhang 2021), ADD (Yi 2022) Introduction Korshunov, P. et al. Overview of BTAS 2016 speaker anti-spoofing competition. in Proc. BTAS 1–6 (2016). doi:10.1109/BTAS.2016.7791200 Zhang, Z., Gu, Y., Yi, X. & Zhao, X. FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection. arXiv Prepr. arXiv2110.09441 (2021) Wu, Z. et al. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. in Proc. Interspeech 2037–2041 (2015). doi:10.21437/Interspeech.2015-462 Todisco, M. et al. ASVspoof 2019: future horizons in spoofed and fake audio detection. in Proc. Interspeech 1008–1012 (2019). doi:10.21437/Interspeech.2019-2249 Yi, J. et al. Add 2022: the first audio deep synthesis detection challenge. in Proc. ICASSP 9216–9220 (2022). CM Training set Test set Anti-spoofing corpus Unseen speakers Unseen spoofing methods
  • 5. 5  CM development flow Introduction CM Training set Test set Space of all possible bona fide and spoofed data En, Fr, Zh, Jp, … Speaker A, B, C, … wav, mp3, m4a, ogg … New spoofing methods Anti-spoofing corpus
  • 6. 6  Questions (Paul 2017, Das 2020, Müller 2022)  Does CM generalize well: no  How to improve? Introduction * From CQSPIC system in Table 1&2 of (Das 2020) Paul, D., Sahidullah, M. & Saha, G. Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. in Proc. ICASSP 2047–2051 (2017). doi:10.1109/ICASSP.2017.7952516 Das, R. K., Yang, J. & Li, H. Assessing the scope of generalized countermeasures for anti-spoofing. in Proc. ICASSP 6589–6593 (2020). doi:10.1109/ICASSP40776.2020.9053086 Nicolas M Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, and Konstantin Böttinger. Does Audio Deepfake Detection Generalize? ArXiv Preprint ArXiv:2203.16263. 2022. CM Training set Test set 1 Test set 2 ASVspoof 2015 test set EER <8% EER >20%* ASVspoof 2019 More training data?
  • 8. 8 CM  General idea  Feature extraction using a pre-trained self-supervised learing (SSL) speech model Method Training set of corpus 1 Test set of corpus 1 Test set of corpus 2 Test set of corpus 3 SSL Huge amount of bona fide data
  • 9. 9 Method DSP: digital signal processing T: number of waveform sampling points, N: number of frames, D: number of dimensions in each frame Sahidullah, M., Kinnunen, T. & Hanilçi, C. A comparison of features for synthetic speech detection. in Proc. Interspeech 2087–2091 (2015).  CM structure - baseline  Front end: linear frequency cepstrum coefficients (LFCCs)(Sahidullah 2015) CM DSP-based front end Back end Input wav. Spoofed Bona fide
  • 10. 10 Method D and N are decided by the CNN configuration Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. in Proc. NIPS vol. 33 12449–12460 (2020).  CM structure - investigated CM Pre-trained SSL-based front end Back end Input wav. Spoofed Bona fide For example, wav2vec 2.0 (Baevski 2020) *
  • 11. 11 Method Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. in Proc. NIPS vol. 33 12449–12460 (2020).  CM structure - investigated CM Pre-trained SSL-based front end Back end Input wav. Spoofed Bona fide For example, wav2vec 2.0 (Baevski 2020)  pre-trained using a self-contrastive loss and a huge amount of bona fide waveforms  Languages, speakers, background noises, reverberation, codec …  many pre-trained SSL models available
  • 12. 12 Method Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. in Proc. NIPS vol. 33 12449–12460 (2020).  CM structure - investigated  Should is be fine-tuned?  How to design back end? • DNN, or a single output layer CM Pre-trained SSL-based front end Back end Input wav. Spoofed Bona fide
  • 14. 14 CM  Data & protocol Experiment LA: logical access, DF: deepfake Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald, and others. VCTK Corpus: English Multi-Speaker Corpus for Cstr Voice Cloning Toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR). 2016. Gautham J. Mysore, “Can we automatically transform speech recorded on common consumer devices in realworld environments into professional production quality speech? - A dataset, insights, and challenges,” IEEE Signal Process. Lett., vol. 22, no. 8, pp. 1006–1010, 2015. M. Wester, “The EMIME bilingual database,” The University of Edinburgh, Tech. Rep., 2010. ASVspoof 2019 LA training set ASVspoof2019 LA test set ASVspoof2015 test set ASVspoof2021 LA VCTK VCTK VCTK ASVspoof2021 Deepfake eval set DAPS EMIME Clean En En En En En, Fi, De, Zh Clean Codec, transmission Compression Spoofing methods are different!
  • 15. 15 Global average pooling Light CNN Fully connected  CM configuration – back end  Baseline: LCNN-variant (LLGF) (Wang 2021) • LCNN + LSTM + GAP + FC  Experimental models • Which type of back end? – LLGF: LCNN + LSTM + GAP + FC – LGF: LSTM + GAP + FC – GF: GAP + FC Experiment Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. A Light CNN for Deep Face Representation with Noisy Labels. IEEE Transactions on Information Forensics and Security 13 (11). IEEE: 2884–2896. doi:10.1109/TIFS.2018.2833032. 2018. Wang, X. & Yamagishi, J. A comparative study on recent neural spoofing countermeasures for synthetic speech detection. in Proc. Interspeech 4259–4263 (2021). doi:10.21437/Interspeech.2021-702 Baseline Experimental models Back end Front end
  • 16. 16  CM configuration – front end  Baseline: LFCC • Configuration from ASVspoof 2019*  Experimental models • SSL front end: Wav2vec 2.0 XLSR • Should we fine-tune the front end on the ASVspoof 2019 training set? – If yes: 1. Initialize front end using pre-trained SSL 2. Train the whole CM w/ a small learning rate – If no: 1. Initialize front end using pre-trained SSL 2. Fix the front end, train the back end Experiment * https://www.asvspoof.org/asvspoof2019/ASVspoof_2019_baseline_CM_v1.zip Baseline Experimental models Back end Front end
  • 17. 17 Results in EER (%) See full results in the paper See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725 ASVspoof 2019 LA ASVspoof 2015 LA ASVspoof 2021 LA eval ASVspoof 2021 DF eval Baseline EERs on SSL (fixed) SSL (fine-tuned)
  • 18. 18 Results in EER (%) See full results in the paper See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725 2019 LA 2015 LA 2021 LA eval 2021 DF eval Baseline: • low EER on ASVspoof 2019 LA test set • high EER on ASVspoof 2015, 2021 !
  • 19. 19 Results in EER (%) See full results in the paper See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725 2019 LA 2015 LA 2021 LA eval 2021 DF eval Using a pre-trained SSL front end is good Fine-tune pre-trained SSL front end is better
  • 20. 20 Results in EER (%) See full results in the paper See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725 2019 LA 2015 LA 2021 LA eval 2021 DF eval Should we fine tune the SSL front end? • Yes Which type of back end is better? • If we fine-tune the front end, difference is small
  • 22. 22 Summary Zhang, Z., Gu, Y., Yi, X. & Zhao, X. FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection. arXiv Prepr. arXiv2110.09441 (2021) Frank, J. & Schönherr, L. WaveFake: A Data Set to Facilitate Audio DeepFake Detection. in Proc. NeurIPS Datasets and Benchmarks 2021 accepted (2021).  Do SSL front ends generalize better than baseline?  Yes  Recommendations from this study Large SSL model pre-trained on diverse data Fine-tuning the SSL front end  However, the best model is not sufficiently generaliable  https://arxiv.org/pdf/2203.14553.pdf  EER > 30% on FMFCC-A Mandarin data (Zhang 2021)  EER > 15% on WaveFake English/Japanese data(Fank 2021)
  • 23. 23 Summary It is comparable to XLS-53 without data augmentation in (Juan M Martin-Donas 2022) Juan M Martin-Donas, and Aitor Álvarez. The Vicomtech Audio Deepfake Detection System Based on Wav2vec2 for the 2022 ADD Challenge. In Proc. ICASSP, 9241–9245. 2022. Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, and Nicholas Evans. Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation. In Proc. Odyssey 2022 The Speaker and Language Recognition Workshop. Vol. accepted. 2022.  Other findinds in the paper & appendix  Why SSL outperformed DSP front end? • It ignores spurious “evidence” in high-frequency band of training data  Which pre-trained SSL model is better?  Detailed EERs on each codec, spoofing methods …  Findings from related works  Data augmentation to CM is useful (Martin-Donas 2022, Tak 2022)  Back-end can be further improved (Tak 2022)
  • 26. 27  CM configuration – front end  Baseline: LFCC • Configuration from ASVspoof 2019* • 60 dim. per frame (s, △, △2)  Experimental models • Which SSL model? • Should we fine-tune the front end on the ASVspoof 2019 training set? – If yes: 1. Initialize front end using pre-trained SSL 2. Train the whole CM w/ a small learning rate – If no: 1. Initialize front end using pre-trained SSL 2. Fix the front end, train the back end only Experiment * https://www.asvspoof.org/asvspoof2019/ASVspoof_2019_baseline_CM_v1.zip Baseline Experimental models Back end Front end
  • 27. 28 Results in EER (%) See full results in the paper See min-tDCF and other results in Appendix: https://arxiv.org/abs/2111.07725 2019 LA 2015 LA 2021 LA eval 2021 DF eval SSL model size Large Extra Large Large Large Small Multi-lingual? Yes (>50) En Yes (>50) En En SSL data (hour) ~55k >60k En >61k, else 4k >60k ~1k Which pre-trained SSL front end?
  • 28. 29 Results in EER (%) Low EER High EER The three train-evaluation rounds (I, II, III) Front end Back end Test set Not fine-tuned (pre-trained model was fixed) fine-tuned
  • 29. 30 CM Feature extraction (front end) Scoring (back end) wavform  Anti-spoofing as binary classification   Evaluation metrics • equal error rate (EER) • min-tDCF(Kinnuen 2020) Introduction Kinnunen, T. et al. Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals. IEEE/ACM Trans. Audio, Speech, Lang. Process. 28, 2195–2210 (2020) Spoofed Bona fide CM score s
  • 30. 31 ASVspoof 2021 DF Cond. Compression (Quality) VBR settings (kbps) C1 None - C2 mp3 (low) ~80-120 C3 mp3 (high) ~220-260 C4 m4a (low) ~20-32 C5 m4a (high) ~96-112 C6 ogg (low) ~80-96 C7 ogg (high) ~256-320 C8 mp3 (low) => m4a (high) ~80-120=>~96-112 C9 ogg (low) => m4a (high) ~80-96=>~96-112 Compression Methods: Audio data: Double compression Single compression No compression
  • 32. 33 4.0 LFCC LLGF 4.1 16.0 9.0 9.8 12.0 2.0 W2V-Large2 Fixed LLGF 2.9 28.0 15.0 1.1 1.1 3.1 W2V-XLSR Fixed LLGF 4.0 35.0 13.2 5.0 5.0 2 W Fi 4 2 4 4 3 On ASVspoof 2019 L Test set Trained CM 4.0 LFCC LLGF 4.1 16.0 9.0 9.8 2.0 W2V-Large2 Fixed LLGF 2.9 28.0 15.0 1.1 3.1 W2V-XLSR Fixed LLGF 4.0 35.0 13.2 5.0 2 W Fi 4 2 4 4 On ASVspoof 2019 L Trained CM  Sub-band analysis after training  Insipred by (Tak 2020)  Given a trained CM: Tak, H., Patino, J., Nautsch, A., Evans, N. & Todisco, M. An Explainability Study of the Constant Q Cepstral Coefficient Spoofing Countermeasure for Automatic Speaker Verification. in Proc. Odyssey 333–340 (2020). doi:10.21437/Odyssey.2020-47 Band-stop filter (0.8 - 2.4 kHz) Distribution of s s s … Analysis Good discrimination Poor discrimination
  • 33. 34  Results on ASVspoof 2019 LA test subset  Baseline replies more on high-freq. “evidence” ?  SSL front end is more sensitive to low freq. band (F1 in speech formant) Analysis See full results in the paper 4.0 LFCC LLGF 4.1 16.0 9.0 9.8 12.0 46.0 CM score 34.0 2.0 W2V-Large2 Fixed LLGF 2.9 28.0 15.0 1.1 1.1 2.0 2.0 3.1 W2V-XLSR Fixed LLGF 4.0 35.0 13.2 5.0 5.0 3.2 6.0 2.1 W2V-XLSR Fine-tuned LLGF 4.0 29.9 4.8 4.0 3.0 3.1 3.2 3.8 W2V-XLSR Fine-tuned GF 5.0 29.9 5.8 4.2 3.8 3.2 3.1 4.0 LFCC LLGF 4.1 16.0 9.0 9.8 12.0 46.0 CM score 34.0 2.0 W2V-Large2 Fixed LLGF 2.9 28.0 15.0 1.1 1.1 2.0 2.0 3.1 W2V-XLSR Fixed LLGF 4.0 35.0 13.2 5.0 5.0 3.2 6.0 2.1 W2V-XLSR Fine-tuned LLGF 4.0 29.9 4.8 4.0 3.0 3.1 3.2 3.8 W2V-XLSR Fine-tuned GF 5.0 29.9 5.8 4.2 3.8 3.2 3.1 4.0 LFCC LLGF 4.1 16.0 9.0 9.8 12.0 46.0 CM score 34.0 2.0 W2V-Large2 Fixed LLGF 2.9 28.0 15.0 1.1 1.1 2.0 2.0 3.1 W2V-XLSR Fixed LLGF 4.0 35.0 13.2 5.0 5.0 3.2 6.0 2.1 W2V-XLSR Fine-tuned LLGF 4.0 29.9 4.8 4.0 3.0 3.1 3.2 3.8 W2V-XLSR Fine-tuned GF 5.0 29.9 5.8 4.2 3.8 3.2 3.1
  • 34. 35 25.2 LFCC LLGF 24.5 27.4 31.4 32.7 32.0 53.6 CM score 43.7 8.0 W2V-Large2 Fixed LLGF 9.7 36.7 17.7 10.3 8.8 8.5 8.1 11.7 W2V-XLSR Fixed LLGF 13.8 33.2 18.5 12.9 12.5 12.1 13.4 4.5 W2V-XLSR Fine-tuned LLGF 5.6 33.9 13.4 5.6 4.6 4.2 4.4 5.5 W2V-XLSR Fine-tuned GF 6.5 34.9 19.9 6.4 5.7 5.5 5.7 25.2 LFCC LLGF 24.5 27.4 31.4 32.7 32.0 53.6 CM score 43.7 8.0 W2V-Large2 Fixed LLGF 9.7 36.7 17.7 10.3 8.8 8.5 8.1 11.7 W2V-XLSR Fixed LLGF 13.8 33.2 18.5 12.9 12.5 12.1 13.4 4.5 W2V-XLSR Fine-tuned LLGF 5.6 33.9 13.4 5.6 4.6 4.2 4.4 5.5 W2V-XLSR Fine-tuned GF 6.5 34.9 19.9 6.4 5.7 5.5 5.7 25.2 LFCC LLGF 24.5 27.4 31.4 32.7 32.0 53.6 CM score 43.7 8.0 W2V-Large2 Fixed LLGF 9.7 36.7 17.7 10.3 8.8 8.5 8.1 11.7 W2V-XLSR Fixed LLGF 13.8 33.2 18.5 12.9 12.5 12.1 13.4 4.5 W2V-XLSR Fine-tuned LLGF 5.6 33.9 13.4 5.6 4.6 4.2 4.4 5.5 W2V-XLSR Fine-tuned GF 6.5 34.9 19.9 6.4 5.7 5.5 5.7  Results on ASVspoof 2021 DF test subset  Baseline replies more on high-freq. “evidence” ?  SSL front end is more sensitive to low freq. band (F1 in speech formant)  Similar patterns on other test sets and SSL front ends Analysis See full results in the paper

Editor's Notes

  1. Standard framework Measure EER over a test set
  2. Standard framework Measure EER over a test set
  3. Standard framework Measure EER over a test set
  4. Standard framework Measure EER over a test set
  5. Standard framework Measure EER over a test set
  6. Standard framework Measure EER over a test set
  7. Standard framework Measure EER over a test set
  8. Data for fine-tuning – ASVspoof 2019 training set
  9. Standard framework Measure EER over a test set
  10. Standard framework Measure EER over a test set
  11. Standard framework Measure EER over a test set
  12. Standard framework Measure EER over a test set