Successfully reported this slideshow.
Your SlideShare is downloading. ×

Nishimoto icchp2010

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 16 Ad
Advertisement

More Related Content

Viewers also liked (20)

Similar to Nishimoto icchp2010 (20)

Advertisement
Advertisement

Recently uploaded (20)

Nishimoto icchp2010

  1. 1. Evaluations of Deletion-Based Method and Mixing-Mased Method for Audio CAPTCHAs Takuya NISHIMOTO (Univ. Tokyo, Japan) Takayuki WATANABE (TWCU, Japan) @nishimotz 1
  2. 2. CAPTCHA  Completely Automated Public Turing test to tell Computers and Humans Apart  popular security techniques on the Web  prevent automated programs from abusing  image-based CAPTCHAs  image containing distorted characters  preventing use of persons with visual disability  audio CAPTCHAs were created  create better audio CAPTCHA tasks  safeness: the difference of recognition performance  usability: mental workload of human in listening speech 2
  3. 3. Performance gap model  performance of machine should be lower  than the intelligibility of human  gap: safeness 100  should be large Human Intelligibility (%)  exposed ratio (ER)  0%: random answer ASR  chance-level; no gap  100%: best guess  easy for both; no gap  practical condition  0 < ER < 100 0 Exposed Ratio (%) 100 (Provided Information) 3
  4. 4. Safeness: ER control  machine is becoming strong  statistical ASR method is the mainstream  supervised machine learning (Hidden Markov Models)  teqniques to cope with the noise  CAPTCHA tasks should be created systematically  it should not be created by trial and error  controllability of Exposed Ratio is essential  Mixing-based method: best way to control ER?  mixing noises / distorting signals  can hide portion of information, however...  difficult to measure the ER, performance is not easy to predict  alternatives must be investiated 4
  5. 5. Usability: Mental workload  CAPTCHAs should not increase mental workload  the workload may increase, if they are..  difficult to listen / memorize the task  long task (many charactors)  difficult to remember  safer, but higher mental workload  requirements  information can be obtained in short time, easily  investigation required  human auditory sensation  language cognition 5
  6. 6. Top-down knowledge  incomplete stimulus  knowledge helps to guess the information  visual sensation:  if part of image is missing, or part of the word is hidden  common knowledge can complement image  about the character and the vocabulary  speech perception:  if "word familiarity" is high: easy to guess  phonemic restoration  may help the human listening 6
  7. 7. Deletion-based method  delete some parts on temporal axis little by little  if every 30 msec over a period of 100 msec is replaced with silence, the 30% of the information was deleted  if the ratio of remained sections go down, the degree of listening difficulty may increase.  Exposed Ratio can be controlled easily  however, not easy to understand.... deletion (original) Festival engine KAL (HMM-based) 7
  8. 8. Phonemic restration  interrupted speech and noise maskers combined  the fence effect  continuity of speech signal perceived  may help human listening  does not affect machine performance  expected to enlarge the gap  performance difference of human and machine deletion + phonemic restration 8
  9. 9. NASA-TLX evaluation  mental workload  rating 6 subscales  Mental, Physical, and Temporal Demands, Frustration, Effort, and Performance  range: 0-100  weights of subscales (6-1)  for each participant  placing an order how the 6 dimensions are related to personal definition of workload  weighted workload (WWL) 9
  10. 10. Deletion vs Mixing (Exp1)  objective: compare intelligibility and mental workload  Deletion-Based Method (DBM)  Mixing-Based Method (MBM)  effect of SNR (signal-to-noise ratio) in MBM  human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  Japanese recorded speech  subjects: 15 (5 x 3) undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject  their average and SD become 50 and 10 respectively 10
  11. 11. Setup (Exp1)  compare DBM and MBM within a person  acoustic presentation: given by headphone  at the subject’s preferred reference loudness level  MBM disturbing signals  utterances of Japanese sentences fragmented as short periods, shuffled and combined Group Trial 1: D30 Trial 2: M0, Mm10, Mm20 G1 DBM 30% MBM SNR 0dB G2 DBM 30% MBM SNR -10dB G3 DBM 30% MBM SNR -20dB 11
  12. 12. Performance (Exp1) DBM(T1):marginally significant (p<0.1) (G1>G2) DBM 30% task is harder than MBM 0dB, -10dB, -20dB MBM(T2): effect of SNR conditions is significant, however, only between 0dB & -10dB (p<0.05) (G1>G2) DBM 30% vs DBM 30% vs DBM 30% vs 100 MBM 0dB MBM -10dB MBM -20dB 90 80 70 60 50 40 T1 T2 30 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 12
  13. 13. Workload (Exp1)  WWL: individual difference cancelled  subtraction of DBM (D30) score from MBM (M0, Mm10 and Mm20) score was performed DBM 30% vs MBM 30% vs DBM 30% vs MBM 0dB MBM -10dB MBM -20dB 20 10 0 s101 s102 s103 s104 s105 s201 s202 s203 s204 s205 s301 s302 s303 s304 s305 -10 -20 -30 WWL: MBM 0db < DBM 30% ? -40 -50 no significance (ANOVA) -60 MBM: task difficulty is not easy to control 13
  14. 14. Human vs Machine (Exp2)  deletion-based method (DBM) is evaluated  automatic speech recognition using HMM  task: numbers (1-7 digits) in Japanese  training: 8440 uttrances, 18 states, 20 mixtures  evaluation: 1001 utterances, sentence recognition  human intelligibility test  75 utterances: 3,4,5 digits numbers (3 x 25)  subjects: 17 undergraduate students  mental workload (WWL) by NASA-TLX  normalized within every subject 14
  15. 15. Results (Exp2)  DBM: Exposed Ratio can controll the gap size 100 70 90 Workload 60 80 70 50 60 50 Human Ave. (%) 40 40 Machine (%) 30 30 30% 50% 70% 30% 50% 70% DBM 30% gap is very large, however, Significant diffrerence (p<0.05) workload is very high. 15
  16. 16. Conclusion  audio CAPTCHA task using phonemic restration  deletion-based method (DBM)  evaluation of CAPTCHA task  performance + mental workload (NASA-TLX)  comparison between DBM and MBM  DBM: easier to controll the task  future works  ASR evaluation of mixing-based method  improve the noise  investigation of phonemic restration  really improving performance? only decreasing workload?  word familiarity, speech rate, synthesized speech, ... 16

×