SlideShare a Scribd company logo
1 of 16
Download to read offline
The comparison between
 the Deletion-Based Method
and the Mixing-Based Method
     for Audio CAPTCHAs

  Takuya NISHIMOTO (Univ. Tokyo, Japan)
   Takayuki WATANABE (TWCU, Japan)
       Interspeech 2010 Mon-Ses2-P3

                                          1
CAPTCHA
   Completely Automated Public Turing test
    to tell Computers and Humans Apart
       popular security techniques on the Web
            prevent automated programs from abusing
       image-based CAPTCHAs
            image containing distorted characters
            preventing use of persons with visual disability
       audio CAPTCHAs were created
   create better audio CAPTCHA tasks
       safeness: the difference of recognition performance
       usability: mental workload of human in listening speech



                                                                  2
Performance gap model
   performance of machine should be lower
       than the intelligibility of human
   gap: safeness                    100
       should be large                                     Human




                                      Intelligibility (%)
   exposed ratio (ER)
       0%: random answer                                                       ASR
            chance-level; no gap
       100%: best guess
            easy for both; no gap
   practical condition
       0 < ER < 100
                                                  0         Exposed Ratio (%)         100
                                                            (Provided Information)
                                                                                            3
Safeness: ER control
   machine is becoming strong
       statistical ASR method is the mainstream
       supervised machine learning (Hidden Markov Models)
       techniques to cope with the noise
   CAPTCHA tasks should be created systematically
       it should not be created by trial and error
       controllability of Exposed Ratio is essential
   Mixing-based method: best way to control ER?
       mixing noises / distorting signals
            can hide portion of information, however...
            difficult to measure the ER, performance is not easy to predict
       alternatives must be investigated

                                                                               4
Usability: Mental workload
   CAPTCHAs should not increase mental workload
   the workload may increase, if they are..
       difficult to listen / memorize the task
   long task (many characters)
       difficult to remember
       safer, but higher mental workload
   requirements
       information can be obtained in short time, easily
   investigation required
       human auditory sensation
       language cognition

                                                            5
Top-down knowledge
   incomplete stimulus
       knowledge helps to guess the information
   visual sensation
       if part of image is missing, or part of the word is hidden
       common knowledge can complement image
            about the character and the vocabulary
   speech perception
       if "word familiarity" is high: easy to guess
   phonemic restoration
       may help the human listening



                                                                     6
Deletion-based method
   delete some parts on temporal axis little by little
       if every 30 msec over a period of 100 msec is replaced with
        silence, the 30% of the information was deleted (D70)
       if the ratio of remained sections go down, the degree of
        listening difficulty may increase.
   Exposed Ratio can be controlled easily
   however, not easy to understand....
                                          deletion (original)



                                             Festival engine
                                             KAL (HMM-based)
                                                                      7
Phonemic restoration
   interrupted speech and noise maskers combined
       the fence effect
       continuity of speech signal perceived
       may help human listening
       does not affect machine performance
   expected to enlarge the gap
       performance difference of human and machine

                                           deletion +
                                           phonemic restoration




                                                                  8
NASA-TLX evaluation
   mental workload
       rating 6 subscales
            Mental, Physical, and Temporal Demands,
             Frustration, Effort, and Performance
       range: 0-100
   weights of subscales (6-1)
       for each participant
       placing an order
        how the 6 dimensions are related
        to personal definition of workload
   weighted workload (WWL)


                                                       9
Deletion vs Mixing (Exp1)
   objective: compare intelligibility and mental workload
       Deletion-Based Method (DBM)
       Mixing-Based Method (MBM)
            effect of SNR (signal-to-noise ratio) in MBM
   human intelligibility test
       75 utterances: 3,4,5 digits numbers (3 x 25)
            Japanese recorded speech
       subjects: 15 (5 x 3) undergraduate students
       mental workload (WWL) by NASA-TLX
            normalized within every subject
            their average and SD become 50 and 10 respectively
   automatic speech recognition using HMM
       task: numbers (1-7 digits) in Japanese
       training: 8440 utterances, 18 states, 20 mixtures
       evaluation: 1001 utterances, sentence recognition
                                                                  10
Setup (Exp1)
   compare DBM and MBM within a person
        acoustic presentation: given by headphone
             at the subject’s preferred reference loudness level
   MBM disturbing signals
        utterances of Japanese sentences
         fragmented as short periods, shuffled and combined
                                                                 MBM(Exp1): Sentence
Group         Trial 1: D30   Trial 2: M0, Mm10, Mm20           recognition using HTK (%)
                                                          80
G1            DBM 30%        MBM SNR 0dB
                                                          60
G2            DBM 30%        MBM SNR -10dB
                                                          40
G3            DBM 30%        MBM SNR -20dB
                                                          20
                                                           0
                                                                    M0   Mm10       Mm20
                                                                                           11
Performance (Exp1)
      DBM(T1):marginally significant (p<0.1) (G1>G2)
      DBM 30% task is harder than MBM 0dB, -10dB, -20dB
      MBM(T2): effect of SNR conditions is significant, however,
      only between 0dB & -10dB (p<0.05) (G1>G2)
               DBM 30% vs                          DBM 30% vs                         DBM 30% vs
100             MBM 0dB                            MBM -10dB                          MBM -20dB

90

80

70

60

50

40                                                                                      T1          T2

30
        s101    s102   s103   s104   s105   s201   s202   s203   s204   s205   s301   s302   s303   s304   s305

                                                                                                                  12
Workload diffefence (Exp1)
   WWL: individual difference cancelled
        subtraction of DBM (D30) score
         from MBM (M0, Mm10 and Mm20) score was performed
                DBM 30% vs                          DBM 30% vs                                        DBM 30% vs
                 MBM 0dB                            MBM -10dB                                         MBM -20dB
    20
    10
    0
         s101    s102   s103   s104   s105   s201   s202   s203   s204        s205             s301   s302         s303   s304         s305
-10
-20
                                                                                     average WWL difference
-30                                                                      20
-40                                                                                                          0.7                 1.0
                                                                          0
-50               WWL: MBM 0db < DBM 30% ?
-60                                       -20
                  no significance (ANOVA)                                             (16.2)
                                                                                     M0-D30           Mm10-D30            Mm20-D30


                  MBM: task difficulty is not easy to control
                                                                                                                                              13
DBM exposed ratio (Exp2)
   DBM: Exposed Ratio can control the gap size
     100                                    70

      90                                                          Workload

                                            60
      80

      70
                                            50
      60

      50                   Human Ave. (%)   40

      40                   Machine (%)
                                            30
      30
                                                   30%      50%              70%
            30%      50%            70%

                                                 DBM 30%
                                                 gap is very large, however,
     Significant difference (p<0.05)             workload is very high.


                                                                                   14
Discussion
   D30 (DBM) & Mm10 (MBM) can be the benchmarks
       for the purpose of comparison between MBM and DBM
               performance difference are close (43.7pt & 44.8pt)
               WWL are also very close (WMm10 - WD30 = 0.7)
                          performance difference
                      between human and machine (pt)
     80

     60

     40

     20

        0
                 M0     Mm10 Mm20               D70    D50   D30



                                                                     15
Conclusion
   audio CAPTCHA task using phonemic restoration
       deletion-based method (DBM)
   evaluation of CAPTCHA task
       performance + mental workload (NASA-TLX)
   comparison between DBM and MBM
       DBM: easier to control the task
   future works
       improve the noise
       investigation of phonemic restoration
            really improving performance? only decreasing workload?
       word familiarity, speech rate, synthesized speech, ...

                                                                       16

More Related Content

What's hot

Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Editor IJARCET
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
An Optimized Transform for ECG Signal Compression
An Optimized Transform for ECG Signal CompressionAn Optimized Transform for ECG Signal Compression
An Optimized Transform for ECG Signal CompressionIDES Editor
 
Image Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet TreeImage Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet TreeIDES Editor
 
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...ijsrd.com
 
Graded Patterns in Attractor Networks
Graded Patterns in Attractor NetworksGraded Patterns in Attractor Networks
Graded Patterns in Attractor Networkstristanjwebb
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression techniquePriyanka Pachori
 
discrete wavelet transform
discrete wavelet transformdiscrete wavelet transform
discrete wavelet transformpiyush_11
 
Panasonic AG-HPX250E
Panasonic AG-HPX250EPanasonic AG-HPX250E
Panasonic AG-HPX250EAV ProfShop
 

What's hot (20)

Final presentation
Final presentationFinal presentation
Final presentation
 
Perceptual Video Coding
Perceptual Video Coding Perceptual Video Coding
Perceptual Video Coding
 
Image Interpolation
Image InterpolationImage Interpolation
Image Interpolation
 
Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154Volume 2-issue-6-2148-2154
Volume 2-issue-6-2148-2154
 
Oo2423882391
Oo2423882391Oo2423882391
Oo2423882391
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Lc3618931897
Lc3618931897Lc3618931897
Lc3618931897
 
An Optimized Transform for ECG Signal Compression
An Optimized Transform for ECG Signal CompressionAn Optimized Transform for ECG Signal Compression
An Optimized Transform for ECG Signal Compression
 
Image Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet TreeImage Compression Using Wavelet Packet Tree
Image Compression Using Wavelet Packet Tree
 
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...
 
164 166
164 166164 166
164 166
 
Graded Patterns in Attractor Networks
Graded Patterns in Attractor NetworksGraded Patterns in Attractor Networks
Graded Patterns in Attractor Networks
 
Hq3114621465
Hq3114621465Hq3114621465
Hq3114621465
 
SIEMENS - Skyra - 3T
SIEMENS - Skyra - 3TSIEMENS - Skyra - 3T
SIEMENS - Skyra - 3T
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression technique
 
Designing of noise barrier
Designing of noise barrierDesigning of noise barrier
Designing of noise barrier
 
Feature
FeatureFeature
Feature
 
discrete wavelet transform
discrete wavelet transformdiscrete wavelet transform
discrete wavelet transform
 
B046040611
B046040611B046040611
B046040611
 
Panasonic AG-HPX250E
Panasonic AG-HPX250EPanasonic AG-HPX250E
Panasonic AG-HPX250E
 

Viewers also liked

opensource and accessibility (Dec2000) Part 2
opensource and accessibility (Dec2000) Part 2opensource and accessibility (Dec2000) Part 2
opensource and accessibility (Dec2000) Part 2Takuya Nishimoto
 
Деловая разведка и контрразведка
Деловая разведка и контрразведкаДеловая разведка и контрразведка
Деловая разведка и контрразведкаДмитрий Иванов
 
QuickBooks Accounting
QuickBooks AccountingQuickBooks Accounting
QuickBooks AccountingArvind Panwar
 
コンピュータの構成と設計 第3版 第1章 勉強会資料
コンピュータの構成と設計 第3版 第1章 勉強会資料コンピュータの構成と設計 第3版 第1章 勉強会資料
コンピュータの構成と設計 第3版 第1章 勉強会資料futada
 
ウェブアーキテクチャの歴史と未来
ウェブアーキテクチャの歴史と未来ウェブアーキテクチャの歴史と未来
ウェブアーキテクチャの歴史と未来Kazuho Oku
 

Viewers also liked (7)

Sangyo2009 02
Sangyo2009 02Sangyo2009 02
Sangyo2009 02
 
opensource and accessibility (Dec2000) Part 2
opensource and accessibility (Dec2000) Part 2opensource and accessibility (Dec2000) Part 2
opensource and accessibility (Dec2000) Part 2
 
Деловая разведка и контрразведка
Деловая разведка и контрразведкаДеловая разведка и контрразведка
Деловая разведка и контрразведка
 
QuickBooks Accounting
QuickBooks AccountingQuickBooks Accounting
QuickBooks Accounting
 
コンピュータの構成と設計 第3版 第1章 勉強会資料
コンピュータの構成と設計 第3版 第1章 勉強会資料コンピュータの構成と設計 第3版 第1章 勉強会資料
コンピュータの構成と設計 第3版 第1章 勉強会資料
 
Sangyo2008 04
Sangyo2008 04Sangyo2008 04
Sangyo2008 04
 
ウェブアーキテクチャの歴史と未来
ウェブアーキテクチャの歴史と未来ウェブアーキテクチャの歴史と未来
ウェブアーキテクチャの歴史と未来
 

Similar to Nishimoto Interspeech 2010 v3

Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...devashishsarkar
 
Sun Opm Pon Power Meter
Sun Opm Pon Power MeterSun Opm Pon Power Meter
Sun Opm Pon Power Meterxiaozhushiwo
 
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT sipij
 
Iberspeech2012
Iberspeech2012Iberspeech2012
Iberspeech2012joseangl
 
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelIDES Editor
 
Project - Sound Model Similarity Search
Project - Sound Model Similarity SearchProject - Sound Model Similarity Search
Project - Sound Model Similarity SearchSudarshan Bala
 
Sun Ot5000 Plus Otdr
Sun Ot5000 Plus OtdrSun Ot5000 Plus Otdr
Sun Ot5000 Plus OtdrVillat2012
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]威華 王
 
FBMC-OQAM Modulation.pptx
FBMC-OQAM Modulation.pptxFBMC-OQAM Modulation.pptx
FBMC-OQAM Modulation.pptxMONEERTHAMEER
 
Empirical mode decomposition and normal shrink tresholding for speech denoising
Empirical mode decomposition and normal shrink tresholding for speech denoisingEmpirical mode decomposition and normal shrink tresholding for speech denoising
Empirical mode decomposition and normal shrink tresholding for speech denoisingijitjournal
 
Detectors for light microscopy
Detectors for light microscopyDetectors for light microscopy
Detectors for light microscopyandortech
 
Tracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real TimeTracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real TimeCameron Craddock
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...CSCJournals
 
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISINGWAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISINGIJNSA Journal
 
Mri image quality gamal mahdaly
Mri image quality gamal mahdalyMri image quality gamal mahdaly
Mri image quality gamal mahdalyGamal Mahdaly
 
Sun Ot4900 Otdr Datesheet
Sun Ot4900 Otdr DatesheetSun Ot4900 Otdr Datesheet
Sun Ot4900 Otdr Datesheetxiaozhushiwo
 
GRUPO 4 : new algorithm for image noise reduction
GRUPO 4 :  new algorithm for image noise reductionGRUPO 4 :  new algorithm for image noise reduction
GRUPO 4 : new algorithm for image noise reductionviisonartificial2012
 

Similar to Nishimoto Interspeech 2010 v3 (20)

project_final
project_finalproject_final
project_final
 
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
 
Sun Opm Pon Power Meter
Sun Opm Pon Power MeterSun Opm Pon Power Meter
Sun Opm Pon Power Meter
 
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
 
Iberspeech2012
Iberspeech2012Iberspeech2012
Iberspeech2012
 
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
 
Project - Sound Model Similarity Search
Project - Sound Model Similarity SearchProject - Sound Model Similarity Search
Project - Sound Model Similarity Search
 
Sun Ot5000 Plus Otdr
Sun Ot5000 Plus OtdrSun Ot5000 Plus Otdr
Sun Ot5000 Plus Otdr
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 
FBMC-OQAM Modulation.pptx
FBMC-OQAM Modulation.pptxFBMC-OQAM Modulation.pptx
FBMC-OQAM Modulation.pptx
 
Empirical mode decomposition and normal shrink tresholding for speech denoising
Empirical mode decomposition and normal shrink tresholding for speech denoisingEmpirical mode decomposition and normal shrink tresholding for speech denoising
Empirical mode decomposition and normal shrink tresholding for speech denoising
 
TETRIS002
TETRIS002TETRIS002
TETRIS002
 
Detectors for light microscopy
Detectors for light microscopyDetectors for light microscopy
Detectors for light microscopy
 
Tracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real TimeTracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real Time
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
Blind, Non-stationary Source Separation Using Variational Mode Decomposition ...
 
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISINGWAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
 
Mri image quality gamal mahdaly
Mri image quality gamal mahdalyMri image quality gamal mahdaly
Mri image quality gamal mahdaly
 
Sun Ot4900 Otdr Datesheet
Sun Ot4900 Otdr DatesheetSun Ot4900 Otdr Datesheet
Sun Ot4900 Otdr Datesheet
 
GRUPO 4 : new algorithm for image noise reduction
GRUPO 4 :  new algorithm for image noise reductionGRUPO 4 :  new algorithm for image noise reduction
GRUPO 4 : new algorithm for image noise reduction
 

More from Takuya Nishimoto

221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ている221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ているTakuya Nishimoto
 
220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集Takuya Nishimoto
 
220126 python-datalake-spark
220126 python-datalake-spark220126 python-datalake-spark
220126 python-datalake-sparkTakuya Nishimoto
 
211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解する211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解するTakuya Nishimoto
 
211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10Takuya Nishimoto
 
210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれから210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれからTakuya Nishimoto
 
210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPython210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPythonTakuya Nishimoto
 
210526 Power Automate Desktop Python
210526 Power Automate Desktop Python210526 Power Automate Desktop Python
210526 Power Automate Desktop PythonTakuya Nishimoto
 
191208 python-kansai-nishimoto
191208 python-kansai-nishimoto191208 python-kansai-nishimoto
191208 python-kansai-nishimotoTakuya Nishimoto
 
191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimoto191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimotoTakuya Nishimoto
 
190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjp190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjpTakuya Nishimoto
 

More from Takuya Nishimoto (20)

221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ている221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ている
 
220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集
 
220126 python-datalake-spark
220126 python-datalake-spark220126 python-datalake-spark
220126 python-datalake-spark
 
211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解する211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解する
 
211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10
 
210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれから210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれから
 
210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPython210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPython
 
210728 mpy
210728 mpy210728 mpy
210728 mpy
 
210630 python
210630 python210630 python
210630 python
 
210526 Power Automate Desktop Python
210526 Power Automate Desktop Python210526 Power Automate Desktop Python
210526 Power Automate Desktop Python
 
210428 python
210428 python210428 python
210428 python
 
200918 hannari-python
200918 hannari-python200918 hannari-python
200918 hannari-python
 
200429 python
200429 python200429 python
200429 python
 
200325 flask
200325 flask200325 flask
200325 flask
 
200208 osh-nishimoto-v2
200208 osh-nishimoto-v2200208 osh-nishimoto-v2
200208 osh-nishimoto-v2
 
191208 python-kansai-nishimoto
191208 python-kansai-nishimoto191208 python-kansai-nishimoto
191208 python-kansai-nishimoto
 
191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimoto191101 nvda-sightworld-nishimoto
191101 nvda-sightworld-nishimoto
 
191114 iotlt-nishimoto
191114 iotlt-nishimoto191114 iotlt-nishimoto
191114 iotlt-nishimoto
 
191030 anna-with-python
191030 anna-with-python191030 anna-with-python
191030 anna-with-python
 
190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjp190916 nishimoto-nvda-pyconjp
190916 nishimoto-nvda-pyconjp
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Nishimoto Interspeech 2010 v3

  • 1. The comparison between the Deletion-Based Method and the Mixing-Based Method for Audio CAPTCHAs Takuya NISHIMOTO (Univ. Tokyo, Japan) Takayuki WATANABE (TWCU, Japan) Interspeech 2010 Mon-Ses2-P3 1
  • 2. CAPTCHA  Completely Automated Public Turing test to tell Computers and Humans Apart  popular security techniques on the Web  prevent automated programs from abusing  image-based CAPTCHAs  image containing distorted characters  preventing use of persons with visual disability  audio CAPTCHAs were created  create better audio CAPTCHA tasks  safeness: the difference of recognition performance  usability: mental workload of human in listening speech 2
  • 3. Performance gap model  performance of machine should be lower  than the intelligibility of human  gap: safeness 100  should be large Human Intelligibility (%)  exposed ratio (ER)  0%: random answer ASR  chance-level; no gap  100%: best guess  easy for both; no gap  practical condition  0 < ER < 100 0 Exposed Ratio (%) 100 (Provided Information) 3
  • 4. Safeness: ER control  machine is becoming strong  statistical ASR method is the mainstream  supervised machine learning (Hidden Markov Models)  techniques to cope with the noise  CAPTCHA tasks should be created systematically  it should not be created by trial and error  controllability of Exposed Ratio is essential  Mixing-based method: best way to control ER?  mixing noises / distorting signals  can hide portion of information, however...  difficult to measure the ER, performance is not easy to predict  alternatives must be investigated 4
  • 5. Usability: Mental workload  CAPTCHAs should not increase mental workload  the workload may increase, if they are..  difficult to listen / memorize the task  long task (many characters)  difficult to remember  safer, but higher mental workload  requirements  information can be obtained in short time, easily  investigation required  human auditory sensation  language cognition 5
  • 6. Top-down knowledge  incomplete stimulus  knowledge helps to guess the information  visual sensation  if part of image is missing, or part of the word is hidden  common knowledge can complement image  about the character and the vocabulary  speech perception  if "word familiarity" is high: easy to guess  phonemic restoration  may help the human listening 6
  • 7. Deletion-based method  delete some parts on temporal axis little by little  if every 30 msec over a period of 100 msec is replaced with silence, the 30% of the information was deleted (D70)  if the ratio of remained sections go down, the degree of listening difficulty may increase.  Exposed Ratio can be controlled easily  however, not easy to understand.... deletion (original) Festival engine KAL (HMM-based) 7
  • 8. Phonemic restoration  interrupted speech and noise maskers combined  the fence effect  continuity of speech signal perceived  may help human listening  does not affect machine performance  expected to enlarge the gap  performance difference of human and machine deletion + phonemic restoration 8
  • 9. NASA-TLX evaluation  mental workload  rating 6 subscales  Mental, Physical, and Temporal Demands, Frustration, Effort, and Performance  range: 0-100  weights of subscales (6-1)  for each participant  placing an order how the 6 dimensions are related to personal definition of workload  weighted workload (WWL) 9
  • 10. Deletion vs Mixing (Exp1)  objective: compare intelligibility and mental workload  Deletion-Based Method (DBM)  Mixing-Based Method (MBM)  effect of SNR (signal-to-noise ratio) in MBM  human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  Japanese recorded speech  subjects: 15 (5 x 3) undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject  their average and SD become 50 and 10 respectively  automatic speech recognition using HMM  task: numbers (1-7 digits) in Japanese  training: 8440 utterances, 18 states, 20 mixtures  evaluation: 1001 utterances, sentence recognition 10
  • 11. Setup (Exp1)  compare DBM and MBM within a person  acoustic presentation: given by headphone  at the subject’s preferred reference loudness level  MBM disturbing signals  utterances of Japanese sentences fragmented as short periods, shuffled and combined MBM(Exp1): Sentence Group Trial 1: D30 Trial 2: M0, Mm10, Mm20 recognition using HTK (%) 80 G1 DBM 30% MBM SNR 0dB 60 G2 DBM 30% MBM SNR -10dB 40 G3 DBM 30% MBM SNR -20dB 20 0 M0 Mm10 Mm20 11
  • 12. Performance (Exp1) DBM(T1):marginally significant (p<0.1) (G1>G2) DBM 30% task is harder than MBM 0dB, -10dB, -20dB MBM(T2): effect of SNR conditions is significant, however, only between 0dB & -10dB (p<0.05) (G1>G2) DBM 30% vs DBM 30% vs DBM 30% vs 100 MBM 0dB MBM -10dB MBM -20dB 90 80 70 60 50 40 T1 T2 30 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 12
  • 13. Workload diffefence (Exp1)  WWL: individual difference cancelled  subtraction of DBM (D30) score from MBM (M0, Mm10 and Mm20) score was performed DBM 30% vs DBM 30% vs DBM 30% vs MBM 0dB MBM -10dB MBM -20dB 20 10 0 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 -10 -20 average WWL difference -30 20 -40 0.7 1.0 0 -50 WWL: MBM 0db < DBM 30% ? -60 -20 no significance (ANOVA) (16.2) M0-D30 Mm10-D30 Mm20-D30 MBM: task difficulty is not easy to control 13
  • 14. DBM exposed ratio (Exp2)  DBM: Exposed Ratio can control the gap size 100 70 90 Workload 60 80 70 50 60 50 Human Ave. (%) 40 40 Machine (%) 30 30 30% 50% 70% 30% 50% 70% DBM 30% gap is very large, however, Significant difference (p<0.05) workload is very high. 14
  • 15. Discussion  D30 (DBM) & Mm10 (MBM) can be the benchmarks  for the purpose of comparison between MBM and DBM  performance difference are close (43.7pt & 44.8pt)  WWL are also very close (WMm10 - WD30 = 0.7) performance difference between human and machine (pt) 80 60 40 20 0 M0 Mm10 Mm20 D70 D50 D30 15
  • 16. Conclusion  audio CAPTCHA task using phonemic restoration  deletion-based method (DBM)  evaluation of CAPTCHA task  performance + mental workload (NASA-TLX)  comparison between DBM and MBM  DBM: easier to control the task  future works  improve the noise  investigation of phonemic restoration  really improving performance? only decreasing workload?  word familiarity, speech rate, synthesized speech, ... 16