Story of reCAPTCHANaga Chokkanathan
Remember This?                                               • CAPTCHA                                                 –  ...
CAPTCHA           • Yahoo! popularized it first           • Later, almost every website started using CAPTCHA to          ...
Another Problem           • Digitizing Books           • Process:                    – Stage 1                            ...
OCR           •      Optical Character Recognition           •      Wonderful technology           •      But not always r...
Possible Solutions           • Manual Corrections                    – Near Impossible                    – VERY Expensive...
Crowd Sourcing           • Each book contains 25000 words (Assume)                    –    Can we split them to 25 people,...
Dr. Luis von Ahn           •      Associate Professor @ Carnegie Mellon University           •      Coined the word CAPTCH...
reCAPTCHA          Story of reCAPTCHA                   www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
reCAPTCHA Process           • Step 1 : Using multiple OCR Programs                    – Accept Matching Words             ...
How It Works                   Flagged Word                          Control Word                                         ...
Few Statistics           • 100M+ reCAPTCHAs every day           • 96000+ Websites                    – Most major websites...
What We Can Do           • Use reCAPTCHA instead of CAPTCHA in your             websites, wherever required               ...
Applying Crowd Sourcing           • Can it solve some of your existing problems?          Story of reCAPTCHA              ...
References, Image Credits           •      https://www.youtube.com/watch?v=VoybhowC4LE           •      http://www.nytimes...
Thank you          Story of reCAPTCHA                               www.crmit.com© Copyright 2013 CRMIT. All rights reserv...
Upcoming SlideShare
Loading in …5
×

Story of reCAPTCHA

2,743
-1

Published on

Story of reCAPTCHA : A Session By Naga Chokkanathan @ CRMIT (http://www.crmit.com/)

Video Recording Of This Session : http://youtu.be/K5XI60uc06c

Published in: Technology
2 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total Views
2,743
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
64
Comments
2
Likes
1
Embeds 0
No embeds

No notes for slide

Story of reCAPTCHA

  1. 1. Story of reCAPTCHANaga Chokkanathan
  2. 2. Remember This? • CAPTCHA – Completely – Automated – Public – Turing test to tell – Computers and – Humans – Apart • Security for the website, Agreed • But for the real users? • BORING task • Waste of time Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  3. 3. CAPTCHA • Yahoo! popularized it first • Later, almost every website started using CAPTCHA to avoid automated attacks • Very effective : Only people can crack those word / image puzzles • But, it is a waste of time too – Assuming you spend 10 seconds on a CAPTCHA – Multiplied by 200 Million CAPTCHAs every day – Thousands of hours being wasted on a daily basis • Can something be done about this? (1) Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  4. 4. Another Problem • Digitizing Books • Process: – Stage 1 • Scan • Convert to image • Save – Stage 2 • Use OCR to convert images to text • Searchable Text Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  5. 5. OCR • Optical Character Recognition • Wonderful technology • But not always reliable • Especially with old text (due to ancient typeface, damages, stains etc.,) • Can something be done about this? (2) Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  6. 6. Possible Solutions • Manual Corrections – Near Impossible – VERY Expensive • Using multiple OCR Programs – They will still make mistakes – But not the same mistakes – Hopefully! • Can something be done about this? (3) Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  7. 7. Crowd Sourcing • Each book contains 25000 words (Assume) – Can we split them to 25 people, each correcting 1000 words? – Or 50 people, each 500 words? – Or 100 people, each 250 words? – Or 2500 people, each 10 words? – Or 25000 people, each 1 word? • Sounds Stupid? – Think again! Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  8. 8. Dr. Luis von Ahn • Associate Professor @ Carnegie Mellon University • Coined the word CAPTCHA • Pioneer in the field of Crowdsourcing • Founder of the company reCAPTCHA (Later acquired by Google) Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  9. 9. reCAPTCHA Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  10. 10. reCAPTCHA Process • Step 1 : Using multiple OCR Programs – Accept Matching Words – Use Dictionary – Flag “Problematic” Words • Step 2 : reCAPTCHA – Millions of users on various websites fill reCAPTCHA forms • Proving they are not robots • Proof reading text, One word at a time – Similar entries are compared, before arriving at the final word Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  11. 11. How It Works Flagged Word Control Word (Real CAPTCHA) Remember “25000 people, Proof Reading 1 Word at a time”? Not “Stupid” Anymore! Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  12. 12. Few Statistics • 100M+ reCAPTCHAs every day • 96000+ Websites – Most major websites use it • Facebook, Twitter, CNN etc., • Security concerns exist! Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  13. 13. What We Can Do • Use reCAPTCHA instead of CAPTCHA in your websites, wherever required – Registration Forms, Blogs, Forums etc., – Easy to use Widgets • Be proud when filling a reCAPTCHA form – You are helping Google preserve books ☺ Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  14. 14. Applying Crowd Sourcing • Can it solve some of your existing problems? Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  15. 15. References, Image Credits • https://www.youtube.com/watch?v=VoybhowC4LE • http://www.nytimes.com/2011/03/29/science/29recaptcha.html?_r=1& • http://techie-buzz.com/tech-news/recaptcha-crowdsourcing-ocr-google- books.html • http://www.google.com/recaptcha • http://drupal.org/project/captcha • http://www.captcha.net/ • http://www.brothersoft.com/cuneiform-ocr-4384.html • http://www.compzets.com/view-upload.php?id=166&action=view • http://en.wikipedia.org/ Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  16. 16. Thank you Story of reCAPTCHA www.crmit.com© Copyright 2013 CRMIT. All rights reserved.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×