Nishimoto icchp2010

Takuya Nishimoto
Takuya NishimotoSoftware Developer at Shuaruta Inc.
Evaluations of Deletion-Based Method
     and Mixing-Mased Method
         for Audio CAPTCHAs


 Takuya NISHIMOTO (Univ. Tokyo, Japan)
  Takayuki WATANABE (TWCU, Japan)
              @nishimotz

                                         1
CAPTCHA
   Completely Automated Public Turing test
    to tell Computers and Humans Apart
       popular security techniques on the Web
            prevent automated programs from abusing
       image-based CAPTCHAs
            image containing distorted characters
            preventing use of persons with visual disability
       audio CAPTCHAs were created
   create better audio CAPTCHA tasks
       safeness: the difference of recognition performance
       usability: mental workload of human in listening speech



                                                                  2
Performance gap model
   performance of machine should be lower
       than the intelligibility of human
   gap: safeness                    100
       should be large                                     Human



                                      Intelligibility (%)
   exposed ratio (ER)
       0%: random answer                                                      ASR
            chance-level; no gap
       100%: best guess
            easy for both; no gap
   practical condition
       0 < ER < 100
                                                  0         Exposed Ratio (%)        100
                                                            (Provided Information)
                                                                                           3
Safeness: ER control
   machine is becoming strong
       statistical ASR method is the mainstream
       supervised machine learning (Hidden Markov Models)
       teqniques to cope with the noise
   CAPTCHA tasks should be created systematically
       it should not be created by trial and error
       controllability of Exposed Ratio is essential
   Mixing-based method: best way to control ER?
       mixing noises / distorting signals
            can hide portion of information, however...
            difficult to measure the ER, performance is not easy to predict
       alternatives must be investiated
                                                                               4
Usability: Mental workload
   CAPTCHAs should not increase mental workload
   the workload may increase, if they are..
       difficult to listen / memorize the task
   long task (many charactors)
       difficult to remember
       safer, but higher mental workload
   requirements
       information can be obtained in short time, easily
   investigation required
       human auditory sensation
       language cognition

                                                            5
Top-down knowledge
   incomplete stimulus
       knowledge helps to guess the information
   visual sensation:
       if part of image is missing, or part of the word is hidden
       common knowledge can complement image
            about the character and the vocabulary
   speech perception:
       if "word familiarity" is high: easy to guess
   phonemic restoration
       may help the human listening



                                                                     6
Deletion-based method
   delete some parts on temporal axis little by little
       if every 30 msec over a period of 100 msec is replaced
        with silence, the 30% of the information was deleted
       if the ratio of remained sections go down, the degree of
        listening difficulty may increase.
   Exposed Ratio can be controlled easily
   however, not easy to understand....
                                            deletion (original)



                                            Festival engine
                                            KAL (HMM-based)
                                                                   7
Phonemic restration
   interrupted speech and noise maskers combined
       the fence effect
       continuity of speech signal perceived
       may help human listening
       does not affect machine performance
   expected to enlarge the gap
       performance difference of human and machine

                                           deletion +
                                           phonemic restration




                                                                 8
NASA-TLX evaluation
   mental workload
       rating 6 subscales
            Mental, Physical, and Temporal
             Demands, Frustration, Effort, and
             Performance
       range: 0-100
   weights of subscales (6-1)
       for each participant
       placing an order
        how the 6 dimensions are related
        to personal definition of workload
   weighted workload (WWL)

                                                 9
Deletion vs Mixing (Exp1)
   objective: compare intelligibility and mental workload
       Deletion-Based Method (DBM)
       Mixing-Based Method (MBM)
            effect of SNR (signal-to-noise ratio) in MBM
   human intelligibility test
       75 utterances: 3,4,5 digits numbers (3 x 25)
            Japanese recorded speech
       subjects: 15 (5 x 3) undergraduate students
       mental workload (WWL) by NASA-TLX
            normalized within every subject
            their average and SD become 50 and 10 respectively



                                                                  10
Setup (Exp1)
   compare DBM and MBM within a person
       acoustic presentation: given by headphone
            at the subject’s preferred reference loudness level
   MBM disturbing signals
       utterances of Japanese sentences
        fragmented as short periods, shuffled and combined
    Group       Trial 1: D30             Trial 2: M0, Mm10, Mm20

    G1          DBM 30%                  MBM SNR 0dB

    G2          DBM 30%                  MBM SNR -10dB

    G3          DBM 30%                  MBM SNR -20dB

                                                                   11
Performance (Exp1)
      DBM(T1):marginally significant (p<0.1) (G1>G2)
      DBM 30% task is harder than MBM 0dB, -10dB, -20dB
      MBM(T2): effect of SNR conditions is significant, however,
      only between 0dB & -10dB (p<0.05) (G1>G2)
               DBM 30% vs                          DBM 30% vs                          DBM 30% vs
100             MBM 0dB                            MBM -10dB                           MBM -20dB

90

80

70

60

50

40                                                                                       T1          T2

30
        s101    s102   s103   s104   s105   s201    s202   s203   s204   s205   s301   s302   s303   s304   s305

                                                                                                                   12
Workload (Exp1)
    WWL: individual difference cancelled
            subtraction of DBM (D30) score
             from MBM (M0, Mm10 and Mm20) score was performed
                    DBM 30% vs                          MBM 30% vs                          DBM 30% vs
                     MBM 0dB                            MBM -10dB                           MBM -20dB
    20
    10
    0
             s101    s102   s103   s104   s105   s201    s202   s203   s204   s205   s301   s302   s303   s304   s305
-10
-20
-30
                                   WWL: MBM 0db < DBM 30% ?
-40
-50
                                   no significance (ANOVA)
-60
                                   MBM: task difficulty is not easy to control


                                                                                                                        13
Human vs Machine (Exp2)
   deletion-based method (DBM) is evaluated
   automatic speech recognition using HMM
      task: numbers (1-7 digits) in Japanese
      training: 8440 uttrances, 18 states, 20 mixtures
      evaluation: 1001 utterances, sentence recognition
   human intelligibility test
      75 utterances: 3,4,5 digits numbers (3 x 25)
      subjects: 17 undergraduate students
      mental workload (WWL) by NASA-TLX
          normalized within every subject




                                                           14
Results (Exp2)
   DBM: Exposed Ratio can controll the gap size
     100                                  70

      90                                                       Workload

                                          60
      80

      70
                                          50
      60

      50                 Human Ave. (%)   40

      40                 Machine (%)
                                          30
      30
                                                 30%     50%              70%
           30%     50%            70%

                                               DBM 30%
                                               gap is very large, however,
    Significant diffrerence (p<0.05)           workload is very high.


                                                                                15
Conclusion
   audio CAPTCHA task using phonemic restration
       deletion-based method (DBM)
   evaluation of CAPTCHA task
       performance + mental workload (NASA-TLX)
   comparison between DBM and MBM
       DBM: easier to controll the task
   future works
       ASR evaluation of mixing-based method
       improve the noise
       investigation of phonemic restration
            really improving performance? only decreasing workload?
       word familiarity, speech rate, synthesized speech, ...
                                                                       16
1 of 16

Recommended

Nishimoto Interspeech 2010 v3 by
Nishimoto Interspeech 2010 v3Nishimoto Interspeech 2010 v3
Nishimoto Interspeech 2010 v3Takuya Nishimoto
788 views16 slides
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont... by
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...
Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Cont...devashishsarkar
267 views56 slides
Me2521122119 by
Me2521122119Me2521122119
Me2521122119IJERA Editor
653 views8 slides
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012 by
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
823 views20 slides
Lc3618931897 by
Lc3618931897Lc3618931897
Lc3618931897IJERA Editor
299 views5 slides

More Related Content

Viewers also liked

Dynamic range and the many ways producers manipulate dynamic range by
Dynamic range and the many ways producers manipulate dynamic rangeDynamic range and the many ways producers manipulate dynamic range
Dynamic range and the many ways producers manipulate dynamic rangeSebastián La Rocca
371 views2 slides
iPad productivity usage 101 (basics) by
iPad productivity usage 101 (basics)iPad productivity usage 101 (basics)
iPad productivity usage 101 (basics)Muhammad S Jameel, PMP [LION]
1.8K views1 slide
Digital Audio Workstations - Lesson 1 Assignment by
Digital Audio Workstations - Lesson 1 AssignmentDigital Audio Workstations - Lesson 1 Assignment
Digital Audio Workstations - Lesson 1 AssignmentArtur Shamsutdinov
1.7K views10 slides
Microphone basics by
Microphone basicsMicrophone basics
Microphone basicsArtur Shamsutdinov
397 views11 slides
Type and usage of important audio cable by
Type and usage of important audio cableType and usage of important audio cable
Type and usage of important audio cableSebastián La Rocca
573 views10 slides
ITT TY Music Technology - Week 1 - Analogue & Digital Audio by
ITT TY Music Technology - Week 1 - Analogue & Digital AudioITT TY Music Technology - Week 1 - Analogue & Digital Audio
ITT TY Music Technology - Week 1 - Analogue & Digital Audiophillthomas
402 views13 slides

Viewers also liked(20)

Dynamic range and the many ways producers manipulate dynamic range by Sebastián La Rocca
Dynamic range and the many ways producers manipulate dynamic rangeDynamic range and the many ways producers manipulate dynamic range
Dynamic range and the many ways producers manipulate dynamic range
Digital Audio Workstations - Lesson 1 Assignment by Artur Shamsutdinov
Digital Audio Workstations - Lesson 1 AssignmentDigital Audio Workstations - Lesson 1 Assignment
Digital Audio Workstations - Lesson 1 Assignment
Artur Shamsutdinov1.7K views
ITT TY Music Technology - Week 1 - Analogue & Digital Audio by phillthomas
ITT TY Music Technology - Week 1 - Analogue & Digital AudioITT TY Music Technology - Week 1 - Analogue & Digital Audio
ITT TY Music Technology - Week 1 - Analogue & Digital Audio
phillthomas402 views
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525 by Premier Farnell
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525
Study Of 30W Digital Audio Amplifier with Integrated ADC: CS4525
Premier Farnell680 views
AUDIO DIGITAL NUEVATEC-EMA by muevatecema
AUDIO DIGITAL NUEVATEC-EMAAUDIO DIGITAL NUEVATEC-EMA
AUDIO DIGITAL NUEVATEC-EMA
muevatecema191 views
Optimized implementation of an innovative digital audio equalizer by a3labdsp
Optimized implementation of an innovative digital audio equalizerOptimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizer
a3labdsp1.2K views
Hybrid Reverberation Algorithm: a Practical Approach by a3labdsp
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approach
a3labdsp893 views
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015 by Philip Desenne
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015
Decoding Digital Audio: Visualizing and Annotating Linear Time-Based Media 2015
Philip Desenne460 views
Intro to Music Production: assignment 1 (microphone types and polar patterns) by Janice63
Intro to Music Production: assignment 1 (microphone types and polar patterns)Intro to Music Production: assignment 1 (microphone types and polar patterns)
Intro to Music Production: assignment 1 (microphone types and polar patterns)
Janice631.4K views
Practical Applications of Digital Audio Networking by Bob Vanden Burgt
Practical Applications of Digital Audio NetworkingPractical Applications of Digital Audio Networking
Practical Applications of Digital Audio Networking
Bob Vanden Burgt3.2K views
Guide to mixing by danielduart
Guide to mixingGuide to mixing
Guide to mixing
danielduart4.7K views
Digital audio recording by music_hayes
Digital audio recording Digital audio recording
Digital audio recording
music_hayes1.9K views
Intro to Compression: Audio and Video Optimization for Learning by Nick Floro
Intro to Compression: Audio and Video Optimization for LearningIntro to Compression: Audio and Video Optimization for Learning
Intro to Compression: Audio and Video Optimization for Learning
Nick Floro8.8K views
Analogue & Digital by k13086
Analogue & DigitalAnalogue & Digital
Analogue & Digital
k1308620.2K views

Similar to Nishimoto icchp2010

Tracking Dynamic Networks in Real Time by
Tracking Dynamic Networks in Real TimeTracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real TimeCameron Craddock
491 views34 slides
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model by
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelIDES Editor
288 views4 slides
Iberspeech2012 by
Iberspeech2012Iberspeech2012
Iberspeech2012joseangl
261 views15 slides
project_final by
project_finalproject_final
project_finalSamip Kundu
438 views32 slides
PR 171: Large margin softmax loss for Convolutional Neural Networks by
PR 171: Large margin softmax loss for Convolutional Neural NetworksPR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural Networksjaewon lee
591 views19 slides
Introduction to Deep Learning by
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
3.8K views66 slides

Similar to Nishimoto icchp2010 (20)

Tracking Dynamic Networks in Real Time by Cameron Craddock
Tracking Dynamic Networks in Real TimeTracking Dynamic Networks in Real Time
Tracking Dynamic Networks in Real Time
Cameron Craddock491 views
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model by IDES Editor
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
IDES Editor288 views
Iberspeech2012 by joseangl
Iberspeech2012Iberspeech2012
Iberspeech2012
joseangl261 views
PR 171: Large margin softmax loss for Convolutional Neural Networks by jaewon lee
PR 171: Large margin softmax loss for Convolutional Neural NetworksPR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural Networks
jaewon lee591 views
Deep Learning Based Voice Activity Detection and Speech Enhancement by NAVER Engineering
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
NAVER Engineering1.9K views
129966864160453838[1] by 威華 王
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
威華 王296 views
Project - Sound Model Similarity Search by Sudarshan Bala
Project - Sound Model Similarity SearchProject - Sound Model Similarity Search
Project - Sound Model Similarity Search
Sudarshan Bala325 views
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT by sipij
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
sipij27 views
Multimodal emotion recognition at utterance level with spatio-temporal featur... by Carlos Toxtli
Multimodal emotion recognition at utterance level with spatio-temporal featur...Multimodal emotion recognition at utterance level with spatio-temporal featur...
Multimodal emotion recognition at utterance level with spatio-temporal featur...
Carlos Toxtli214 views
BIOMASS_E2ES_IGARSS2011.ppt by grssieee
BIOMASS_E2ES_IGARSS2011.pptBIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.ppt
grssieee358 views
Performance analysis of bangla speech recognizer model using hmm by Abdullah al Mamun
Performance analysis of bangla speech recognizer model using hmmPerformance analysis of bangla speech recognizer model using hmm
Performance analysis of bangla speech recognizer model using hmm
Abdullah al Mamun528 views
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser Access by Vladimir Lyashev
Non-Linear Optimization Schemefor Non-Orthogonal Multiuser AccessNon-Linear Optimization Schemefor Non-Orthogonal Multiuser Access
Non-Linear Optimization Scheme for Non-Orthogonal Multiuser Access
Vladimir Lyashev886 views
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING by IJNSA Journal
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISINGWAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
WAVELET THRESHOLDING APPROACH FOR IMAGE DENOISING
IJNSA Journal17 views
Petar Petrov MSc thesis defense by Petar Petrov
Petar Petrov MSc thesis defensePetar Petrov MSc thesis defense
Petar Petrov MSc thesis defense
Petar Petrov534 views
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua... by sipij
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
sipij350 views
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D... by Shamman Noor Shoudha
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet D...
Algorithm for Lossy Image Compression using FPGA by Mistral Solutions
Algorithm for Lossy Image Compression using FPGAAlgorithm for Lossy Image Compression using FPGA
Algorithm for Lossy Image Compression using FPGA
Mistral Solutions 182 views

More from Takuya Nishimoto

221217 SwiftはPythonに似ている by
221217 SwiftはPythonに似ている221217 SwiftはPythonに似ている
221217 SwiftはPythonに似ているTakuya Nishimoto
49 views21 slides
220427-pydata 統計・データ分析 特集 by
220427-pydata 統計・データ分析 特集220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集Takuya Nishimoto
145 views15 slides
220126 python-datalake-spark by
220126 python-datalake-spark220126 python-datalake-spark
220126 python-datalake-sparkTakuya Nishimoto
134 views19 slides
211120 他人の書いたPythonスクリプトをステップ実行で理解する by
211120 他人の書いたPythonスクリプトをステップ実行で理解する211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解するTakuya Nishimoto
1.3K views39 slides
211020 すごい広島 with OSH 2021.10 by
211020 すごい広島 with OSH 2021.10211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10Takuya Nishimoto
175 views20 slides
210917 オープンセミナー@広島のこれまでとこれから by
210917 オープンセミナー@広島のこれまでとこれから210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれからTakuya Nishimoto
179 views8 slides

More from Takuya Nishimoto(20)

220427-pydata 統計・データ分析 特集 by Takuya Nishimoto
220427-pydata 統計・データ分析 特集220427-pydata 統計・データ分析 特集
220427-pydata 統計・データ分析 特集
Takuya Nishimoto145 views
211120 他人の書いたPythonスクリプトをステップ実行で理解する by Takuya Nishimoto
211120 他人の書いたPythonスクリプトをステップ実行で理解する211120 他人の書いたPythonスクリプトをステップ実行で理解する
211120 他人の書いたPythonスクリプトをステップ実行で理解する
Takuya Nishimoto1.3K views
211020 すごい広島 with OSH 2021.10 by Takuya Nishimoto
211020 すごい広島 with OSH 2021.10211020 すごい広島 with OSH 2021.10
211020 すごい広島 with OSH 2021.10
Takuya Nishimoto175 views
210917 オープンセミナー@広島のこれまでとこれから by Takuya Nishimoto
210917 オープンセミナー@広島のこれまでとこれから210917 オープンセミナー@広島のこれまでとこれから
210917 オープンセミナー@広島のこれまでとこれから
Takuya Nishimoto179 views
210911 これから始める電子工作とMicroPython by Takuya Nishimoto
210911 これから始める電子工作とMicroPython210911 これから始める電子工作とMicroPython
210911 これから始める電子工作とMicroPython
Takuya Nishimoto794 views
210526 Power Automate Desktop Python by Takuya Nishimoto
210526 Power Automate Desktop Python210526 Power Automate Desktop Python
210526 Power Automate Desktop Python
Takuya Nishimoto225 views

Recently uploaded

HTTP headers that make your website go faster - devs.gent November 2023 by
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
22 views151 slides
Evolving the Network Automation Journey from Python to Platforms by
Evolving the Network Automation Journey from Python to PlatformsEvolving the Network Automation Journey from Python to Platforms
Evolving the Network Automation Journey from Python to PlatformsNetwork Automation Forum
13 views21 slides
Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides
SAP Automation Using Bar Code and FIORI.pdf by
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
23 views38 slides
MVP and prioritization.pdf by
MVP and prioritization.pdfMVP and prioritization.pdf
MVP and prioritization.pdfrahuldharwal141
31 views8 slides
Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
36 views43 slides

Recently uploaded(20)

HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman36 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
Future of AR - Facebook Presentation by ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely25 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views

Nishimoto icchp2010

  • 1. Evaluations of Deletion-Based Method and Mixing-Mased Method for Audio CAPTCHAs Takuya NISHIMOTO (Univ. Tokyo, Japan) Takayuki WATANABE (TWCU, Japan) @nishimotz 1
  • 2. CAPTCHA  Completely Automated Public Turing test to tell Computers and Humans Apart  popular security techniques on the Web  prevent automated programs from abusing  image-based CAPTCHAs  image containing distorted characters  preventing use of persons with visual disability  audio CAPTCHAs were created  create better audio CAPTCHA tasks  safeness: the difference of recognition performance  usability: mental workload of human in listening speech 2
  • 3. Performance gap model  performance of machine should be lower  than the intelligibility of human  gap: safeness 100  should be large Human Intelligibility (%)  exposed ratio (ER)  0%: random answer ASR  chance-level; no gap  100%: best guess  easy for both; no gap  practical condition  0 < ER < 100 0 Exposed Ratio (%) 100 (Provided Information) 3
  • 4. Safeness: ER control  machine is becoming strong  statistical ASR method is the mainstream  supervised machine learning (Hidden Markov Models)  teqniques to cope with the noise  CAPTCHA tasks should be created systematically  it should not be created by trial and error  controllability of Exposed Ratio is essential  Mixing-based method: best way to control ER?  mixing noises / distorting signals  can hide portion of information, however...  difficult to measure the ER, performance is not easy to predict  alternatives must be investiated 4
  • 5. Usability: Mental workload  CAPTCHAs should not increase mental workload  the workload may increase, if they are..  difficult to listen / memorize the task  long task (many charactors)  difficult to remember  safer, but higher mental workload  requirements  information can be obtained in short time, easily  investigation required  human auditory sensation  language cognition 5
  • 6. Top-down knowledge  incomplete stimulus  knowledge helps to guess the information  visual sensation:  if part of image is missing, or part of the word is hidden  common knowledge can complement image  about the character and the vocabulary  speech perception:  if "word familiarity" is high: easy to guess  phonemic restoration  may help the human listening 6
  • 7. Deletion-based method  delete some parts on temporal axis little by little  if every 30 msec over a period of 100 msec is replaced with silence, the 30% of the information was deleted  if the ratio of remained sections go down, the degree of listening difficulty may increase.  Exposed Ratio can be controlled easily  however, not easy to understand.... deletion (original) Festival engine KAL (HMM-based) 7
  • 8. Phonemic restration  interrupted speech and noise maskers combined  the fence effect  continuity of speech signal perceived  may help human listening  does not affect machine performance  expected to enlarge the gap  performance difference of human and machine deletion + phonemic restration 8
  • 9. NASA-TLX evaluation  mental workload  rating 6 subscales  Mental, Physical, and Temporal Demands, Frustration, Effort, and Performance  range: 0-100  weights of subscales (6-1)  for each participant  placing an order how the 6 dimensions are related to personal definition of workload  weighted workload (WWL) 9
  • 10. Deletion vs Mixing (Exp1)  objective: compare intelligibility and mental workload  Deletion-Based Method (DBM)  Mixing-Based Method (MBM)  effect of SNR (signal-to-noise ratio) in MBM  human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  Japanese recorded speech  subjects: 15 (5 x 3) undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject  their average and SD become 50 and 10 respectively 10
  • 11. Setup (Exp1)  compare DBM and MBM within a person  acoustic presentation: given by headphone  at the subject’s preferred reference loudness level  MBM disturbing signals  utterances of Japanese sentences fragmented as short periods, shuffled and combined Group Trial 1: D30 Trial 2: M0, Mm10, Mm20 G1 DBM 30% MBM SNR 0dB G2 DBM 30% MBM SNR -10dB G3 DBM 30% MBM SNR -20dB 11
  • 12. Performance (Exp1) DBM(T1):marginally significant (p<0.1) (G1>G2) DBM 30% task is harder than MBM 0dB, -10dB, -20dB MBM(T2): effect of SNR conditions is significant, however, only between 0dB & -10dB (p<0.05) (G1>G2) DBM 30% vs DBM 30% vs DBM 30% vs 100 MBM 0dB MBM -10dB MBM -20dB 90 80 70 60 50 40 T1 T2 30 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 12
  • 13. Workload (Exp1)  WWL: individual difference cancelled  subtraction of DBM (D30) score from MBM (M0, Mm10 and Mm20) score was performed DBM 30% vs MBM 30% vs DBM 30% vs MBM 0dB MBM -10dB MBM -20dB 20 10 0 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 -10 -20 -30 WWL: MBM 0db < DBM 30% ? -40 -50 no significance (ANOVA) -60 MBM: task difficulty is not easy to control 13
  • 14. Human vs Machine (Exp2)  deletion-based method (DBM) is evaluated  automatic speech recognition using HMM  task: numbers (1-7 digits) in Japanese  training: 8440 uttrances, 18 states, 20 mixtures  evaluation: 1001 utterances, sentence recognition  human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  subjects: 17 undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject 14
  • 15. Results (Exp2)  DBM: Exposed Ratio can controll the gap size 100 70 90 Workload 60 80 70 50 60 50 Human Ave. (%) 40 40 Machine (%) 30 30 30% 50% 70% 30% 50% 70% DBM 30% gap is very large, however, Significant diffrerence (p<0.05) workload is very high. 15
  • 16. Conclusion  audio CAPTCHA task using phonemic restration  deletion-based method (DBM)  evaluation of CAPTCHA task  performance + mental workload (NASA-TLX)  comparison between DBM and MBM  DBM: easier to controll the task  future works  ASR evaluation of mixing-based method  improve the noise  investigation of phonemic restration  really improving performance? only decreasing workload?  word familiarity, speech rate, synthesized speech, ... 16