Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Клонирование голоса и как это выявить


Published on

Для доступа к кредитным картам банки начали применять аутентификационную технологию, основанную на голосовых биометрических данных. С точки зрения информационной безопасности такие речевые элементы являются конфиденциальными и им необходима защита от компрометации и обезличивания. Обезличивания можно добиться, применяя методы изменения (клонирования) голоса. Докладчик продемонстрирует программную реализацию метода клонирования голоса, покажет, как система распознавания голоса может определить клонированные, и представит данные исследования о зависимости между показателями работы детектора клонированного голоса и количеством кепстральных свойств, используемых для обучения.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Клонирование голоса и как это выявить

  1. 1. Voice Cloning and its Detection Roman Kazantsev, Dilshod Poshshoev
  2. 2. Voice Biometrics • Every person has unique voice biometrics like a finger print; • Voice biometrics can be used for authorization to different systems (mobile device, smart house, bank account,…) • Voice biometrics is private and needs protections against voice cloning.
  3. 3. What was done? • Neural network based voice cloning implementation using open source software; • Employment of GMM based speaker identification for detection of cloned voice.
  4. 4. NN based voice cloning architecture Source Speech World Vocoder: Feature Extraction Aperiodicity Log-F0 Rastamat: Extraction Mel-FCC Tiny DNN: Neural Network Linear Conversion Log-F0 Aperiodicity Spectrum World Vocoder: Synthesis Target Speech Rastamat: Inversion MFCC to Spectrum Mel-FCC Predicted Mel-FCC AWT(source speaker) and SLT(target speaker) from CMU_ARCTIC:
  5. 5. Alignment of source and target Mel-FCC features using Dynamic Time Warping arctic_a0001: "Author of the danger trail, Philip Steels, etc." Source speaker (AWT) Target speaker (SLT)
  6. 6. Data Mining Routine extract_features_training.m source wav_names{} = {arctic_a0001.wav}, ts_intervals{} = {[0.65, 1.05, 1.12, 1.20];} target wav_names{} = {arctic_a0001.wav}, tt_intervals{} = {[0.20, 0.70, 0.80, 0.88];} s_melfcc_train t_melfcc_train t_mean_logf0 t_var_logf0
  7. 7. Neural Network for Cloning Multilayer σ-activated perceptron [12, 40, 40, 12] is trained and used for prediction of cepstral coefficients … … … … Normalization s_melfcc_train Denormalization t_melfcc_train σ σ σ σ σ σ σ σ σ σ σ σ s_melfcc_predict t_melfcc_predict
  8. 8. Synthesis synthesis.m target.wav source.wav t_var_logf0 t_mean_logf0 t_melfcc_predict
  9. 9. Employment of GMM based speaker recognition tool for cloned voice detection Github link: Examples: Train: -t enroll -i "f1 m1" -m model.out Label f1 has files f1arctic_a0001.wav,f1arctic_a0002.wav Label m1 has files m1arctic_a0001.wav,m1arctic_a0002.wav Start training... 0.545000076294 seconds Predict: -t predict -i "f1/*.wav" -m model.out f1arctic_a0001.wav -> f1 f1arctic_a0002.wav -> f1
  10. 10. Experiment with detection of cloned voice & Results SLT (target speaker) arctic_a0001.wav arctic_a0002.wav arctic_a0003.wav arctic_a0004.wav arctic_a0005.wav … arctic_a0020.wav Train: Predict: SLT (target speaker) wav name probability arctic_b0002_orig.wav 0.953 arctic_b0002_NN.wav 0.765 arctic_b0002_DBN.wav 0.892 arctic_b0002_DBN_MLPG.wav 0.912 arctic_b0002_LSTM.wav 0.745 arctic_b0002_LSTM_MLPG.wav 0.769
  11. 11. Conclusion 1. Speaker recognition systems used for authorization should have meticulously selected probability threshold against cloned voice; 2. Voice biometric should be regularly gathered and updated in database due to physiological changes in organism through ages; 3. Voice biometric based identification is a good addition to multi- factor authorization schemes.
  12. 12. Link to our voice cloner sources:
  13. 13. References • T. Nakashika, R. Takashima, T. Takiguchi, Y. Ariki. Voice Conversion in High-order Eigen Space Using Deep Belief Nets; • WORLD [1] (D4C edition [2]); • PLP and RASTA matlab library; • Tiny-dnn; • CMU_ARCTIC speech database.