Fake news has a negative impact on individuals and society, hence the detection of fake news is becoming a bigger field of interest for data scientists. Attempts to leverage artificial intelligence technologies particularly machine/deep learning techniques and natural language processing (NLP) to automatically detect fake news and prevent its viral spread have recently been actively discussed.
Large technology companies have begun to take steps to address this trend. For example, Google has adjusted its news rankings to prioritize well-known sites and has banned sites with a history of spreading fake news. Facebook has integrated fact checking organizations into its platform.
This SlideShare explores the concept of NLP for detecting fake news in brief.
Fake news has a negative impact on individuals and society, hence the detection of fake news is becoming a bigger field of interest for data scientists. Attempts to leverage artificial intelligence technologies particularly machine/deep learning techniques and natural language processing (NLP) to automatically detect fake news and prevent its viral spread have recently been actively discussed.
Large technology companies have begun to take steps to address this trend. For example, Google has adjusted its news rankings to prioritize well-known sites and has banned sites with a history of spreading fake news. Facebook has integrated fact checking organizations into its platform.
This SlideShare explores the concept of NLP for detecting fake news in brief.
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
The problem of automatic detection of fake news insocial media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded
as a straight-forward, binary classification problem, the major
challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or non-fake news is an expensive and tedious endeavor. In this paper, we discuss a weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets. During collection, we automatically label tweets by their source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. We then use that classifier for a different classification target, i.e., the classification of fake and non-fake tweets. Although the labels are not accurate according to the new classification target (not all tweets by an untrustworthy source need to be fake news, and vice versa), we show that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1 score of up to 0.9.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
Machine Learning based Hybrid Recommendation System
• Developed a Hybrid Movie Recommendation System using both Collaborative and Content-based methods
• Used linear regression framework for determining optimal feature weights from collaborative data
• Recommends movie with maximum similarity score of content-based data
Recently, fake news has been incurring many problems to our society. As a result, many researchers have been working on identifying fake news. Most of the fake news detection systems utilize the linguistic feature of the news. However, they have difficulty in sensing highly ambiguous fake news which can be detected only after identifying meaning and latest related information. In this paper, to resolve this problem, we shall present a new Korean fake news detection system using fact DB which is built and updated by human's direct judgement after collecting obvious facts. Our system receives a proposition, and search the semantically related articles from Fact DB in order to verify whether the given proposition is true or not by comparing the proposition with the related articles in fact DB. To achieve this, we utilize a deep learning model, Bidirectional Multi Perspective Matching for Natural Language Sentence BiMPM , which has demonstrated a good performance for the sentence matching task. However, BiMPM has some limitations in that the longer the length of the input sentence is, the lower its performance is, and it has difficulty in making an accurate judgement when an unlearned word or relation between words appear. In order to overcome the limitations, we shall propose a new matching technique which exploits article abstraction as well as entity matching set in addition to BiMPM. In our experiment, we shall show that our system improves the whole performance for fake news detection. Prasanth. K | Praveen. N | Vijay. S | Auxilia Osvin Nancy. V ""Fake News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30014.pdf
Paper Url : https://www.ijtsrd.com/engineering/information-technology/30014/fake-news-detection-using-machine-learning/prasanth-k
"The proposed system overcomes the above mentioned issue in an efficient way. It aims at analyzing the number of fraud transactions that are present in the dataset.
"
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
Project Report for Twitter Sentiment Analysis done using Apache Flume and data is analysed using Hive.
I intend to address the following questions:
How raw tweets can be used to find audience’s perception or sentiment about a person ?
How Hadoop can be used to solve this problem?
How Apache Hive can be used to organize the final data in a tabular format and query it?
How a data visualization tool can be used to display the findings?
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm PortfolioHTCS LLC
Phoenix Audio Technology has the attached publication available which lists their Audio Signal Algorithm Portfolio, e.g. Multi Sensor Processing, Blind Source Separation, Echo and Reference Channel Canceling, Single Sensor Processing, Multi Resolution Analysis, Single Power Compression, Direction Finding, Data Tracking, Data Fusion, and more.
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
The problem of automatic detection of fake news insocial media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded
as a straight-forward, binary classification problem, the major
challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or non-fake news is an expensive and tedious endeavor. In this paper, we discuss a weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets. During collection, we automatically label tweets by their source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. We then use that classifier for a different classification target, i.e., the classification of fake and non-fake tweets. Although the labels are not accurate according to the new classification target (not all tweets by an untrustworthy source need to be fake news, and vice versa), we show that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1 score of up to 0.9.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
Machine Learning based Hybrid Recommendation System
• Developed a Hybrid Movie Recommendation System using both Collaborative and Content-based methods
• Used linear regression framework for determining optimal feature weights from collaborative data
• Recommends movie with maximum similarity score of content-based data
Recently, fake news has been incurring many problems to our society. As a result, many researchers have been working on identifying fake news. Most of the fake news detection systems utilize the linguistic feature of the news. However, they have difficulty in sensing highly ambiguous fake news which can be detected only after identifying meaning and latest related information. In this paper, to resolve this problem, we shall present a new Korean fake news detection system using fact DB which is built and updated by human's direct judgement after collecting obvious facts. Our system receives a proposition, and search the semantically related articles from Fact DB in order to verify whether the given proposition is true or not by comparing the proposition with the related articles in fact DB. To achieve this, we utilize a deep learning model, Bidirectional Multi Perspective Matching for Natural Language Sentence BiMPM , which has demonstrated a good performance for the sentence matching task. However, BiMPM has some limitations in that the longer the length of the input sentence is, the lower its performance is, and it has difficulty in making an accurate judgement when an unlearned word or relation between words appear. In order to overcome the limitations, we shall propose a new matching technique which exploits article abstraction as well as entity matching set in addition to BiMPM. In our experiment, we shall show that our system improves the whole performance for fake news detection. Prasanth. K | Praveen. N | Vijay. S | Auxilia Osvin Nancy. V ""Fake News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30014.pdf
Paper Url : https://www.ijtsrd.com/engineering/information-technology/30014/fake-news-detection-using-machine-learning/prasanth-k
"The proposed system overcomes the above mentioned issue in an efficient way. It aims at analyzing the number of fraud transactions that are present in the dataset.
"
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
Project Report for Twitter Sentiment Analysis done using Apache Flume and data is analysed using Hive.
I intend to address the following questions:
How raw tweets can be used to find audience’s perception or sentiment about a person ?
How Hadoop can be used to solve this problem?
How Apache Hive can be used to organize the final data in a tabular format and query it?
How a data visualization tool can be used to display the findings?
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm PortfolioHTCS LLC
Phoenix Audio Technology has the attached publication available which lists their Audio Signal Algorithm Portfolio, e.g. Multi Sensor Processing, Blind Source Separation, Echo and Reference Channel Canceling, Single Sensor Processing, Multi Resolution Analysis, Single Power Compression, Direction Finding, Data Tracking, Data Fusion, and more.
A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot.
It is used, commonly, to protect your sites.
Security and User Experience: A Holistic Model for CAPTCHA Usability IssuesKarthikeyan Umapathy
CAPTCHA is a widely adopted security measure on the Web and is designed to effectively distinguish humans and bots by exploiting human’s ability to recognize patterns that an automated bot is incapable of. To counter this, bots are being designed to recognize patterns in CAPTCHAs. As a result, CAPTCHAs are now being designed to maximize the difficulty for bots to pass human interaction proof tests, while making it quite an arduous task even for humans as well. The approachability of CAPTCHA is increasingly being questioned because of the inconvenience it causes to legitimate users. Irrespective of the popularity, CAPTCHA is indispensable if one wants to avoid potential security threats. We investigated the usability issues associated with CAPTCHA. We built a holistic model by identifying the important concepts associated with CAPTCHAs and its usability. This model can be used as a guide for the design and evaluation of CAPTCHAs.
Captcha a security measure against spam attackseSAT Journals
Abstract A CAPTCHA is challenge response test used to ensure that the response is generated by humans. CAPTCHA test are administrator by machines to differentiate between humans and machine. Because of this reason CAPTCHAs are also known as the Reverse Turing Test as contrast to Turing Test which is administrated by humans. CAPCHA is used as a simple puzzle, which restricts various automated programs (also known as internet-bots) to sign-up e-mail accounts, cracking passwords, spam sending etc. A common type of CAPTCHA requires user to recognize the letters from a distorted image, since normal human can easily recognize the CAPTCHA, while that particular text cannot be recognized by bot. In short CAPTCHA program challenges the automated program, which trying to access private data. So, CAPTCHA helps in preventing the access of personal mail accounts by some unauthorized automated spamming programs. Index Terms: CAPTCHA, Security, Spam Attacks, Reverse Turing Test.
Enhancing Web-Security with Stronger CaptchasEditor IJCATR
Captcha are used widely over the World Wide Web to prevent automated programs in order to scrape a data from
websites. Captcha is a challenge response test used to ensure that the response is generated by a person not by a computer. Users
are asked to read and type a string of distorted characters in order to ensure that the user is human or not. Automation is real
problem for web application. Automated attacks can exploit many services:
1. Blogs 2. Forums 3. Phishing 4. Theft of data
Registration Websites use CAPTCHA (completely automated public turing test to tell computers and human apart) systems to
prevent the bot programs from wasting their resources. Today is the Era of where technologies are changes very rapidly. So
spammers are hackers are also trying something new to cracking captcha. That’s why it is necessary to developing an advanced
technology to generating a captcha. Just like simply generating a Captcha Images from text, or rotating an object within images.
t—CAPTCHA, Completely Automated Public
Turning Test to tell Computers and Humans Apart, it is a one
type of test. CAPTCHA is standard internet security they
protect online emails and services from being abused by
malicious computer program or BOT. CAPTCHA is a one type
of test program that protects websites from BOTs on web by
generating tests that computer cannot pass but human can
easily pass. For web security we are using different type of
CAPTCHA. In this paper describe different type of
CAPTCHA technique based on drag and drop mouse action
because drag and drop mouse action is performed by human
not by BOT. This paper is deal with survey and comparison of
different technique of CAPTCHA.
in this captcha report, you get everything those you need in a seminar. hope you like this report..
please check it out. and use it
for more report contect me. my email id is rkrakeshkumar99@gmail.com
How effective are CAPTCHAs as a security mechanism against malicious automation? We report and analyze four case studies and draw conclusions as to the best ways to implement CAPTCHAs as an integrated part of a security strategy. Specifically, security teams should use novel CAPTCHA methods that make the CAPTCHA into something enjoyable, like a mini-game. Also, we help identify how to present a CAPTCHA only when users exhibit suspicious behavior by implementing various automation detection mechanisms.
The internet has been playing an increasingly important role in our daily life, with the availability of many web services such as email and search engines. However, these are often threatened by attacks from computer programs such as bots. To address this problem, CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) was developed to distinguish between computer programs and human users. Although this mechanism offers good security and limits automatic registration to web services, some CAPTCHAs have several weaknesses which allow hackers to infiltrate the mechanism of the CAPTCHA. This paper examines recent research on various CAPTCHA methods and their categories. Moreover it discusses the weakness and strength of these types.
A seminar of CAPTCHA
/* A very kind request to the viewers, please don't try to download it and just edit the name. Just take the idea from the presentation and add your innovation to it */
I hope it will be helpful to you all :)
Good Luck :)
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
1. UE6858
“CAPTCHAS”
“CAPTCHAS”
Submitted in fulfillment of Seminar required for the
Bachelor of Engineering (B.E)
In
Information Technology
By
Sachin Narang
UE6858, 8th Semester
Panjab University
Under the Supervision
Of
Ms. Roopali Garg
Associate Professor, UIET
UE6858 Page 1
2. UE6858
“CAPTCHAS”
Contents
S.No Topic Page No.
1 Cover Page 1
2 Contents 2
3 Acknowledgment 3
4 Declaration 4
5 Certificate 5
4 Introduction 6
5 Why use CAPTCHAS 7
6 Definitions 8
7 Types of CAPTCHAS 9
8 Major Areas Of Applications 11
9 ReCAPTCHA 14
10 Breaking of CAPTCHAS 16
11 New Proposed Approaches 17
12 Conclusion 19
13 Bibliography 20
Acknowledgement
UE6858 Page 2
3. UE6858
“CAPTCHAS”
This is to thank all those who supported and helped me throughout the
commencement of this seminar report. I would like to thank specially my
teacher in-charge, Ms. Roopali Garg for her continuous guidance. I would
also like to thank my friends for their encouragement. Also, each time they
found a mistake and suggested a correction and led this seminar to
perfection.
Sachin Narang
B.E, I.T, UE6858
U.I.E.T
Declaration
UE6858 Page 3
4. UE6858
“CAPTCHAS”
I hereby declare that the work which is being presented in this seminar
report on ‘CAPTCHAS’ submitted at U.I.E.T., Panjab University is an
authentic work presented by Mr. Sachin Narang (UE6858) of B.E. (I.T.) 8th
semester under the supervision of Ms. Roopali Garg.
Sachin Narang
B.E, I.T, UE6858
U.I.E.T
Certificate
UE6858 Page 4
5. UE6858
“CAPTCHAS”
This is to certify that Mr. Sachin Narang, UE6858 , B.E. (I.T.) 8th Semester
have completed seminar report, in accordance with the requirement for
qualifying 8th semester, on CAPTCHAS under the guidance of Ms. Roopali
Garg.
Roopali Garg
Associate Professor
(Teacher In-Charge)
Introduction
UE6858 Page 5
6. UE6858
“CAPTCHAS”
Use of INTERNET has remarkably increased Globally in the
past 10-12 years and so is the need of the Security over it.
Marketing and Advertisement over INTERNET has seen
companies like GOOGLE being made, which at the moment
is traded at 181 billion USD ie. Almost twice of General
Motors, McDonalds combined.
Well this presentation is about Security achieved over
Internet using CAPTCHAS. CAPTCHAS are basically
software programs which act as a test to any user over
internet that the person (user) is a human or another
machine. This concept is used by all the big companies
over internet Google, yahoo or facebook (name any).So
what are these CAPTCHAS? And what are their possible
applications? This is what we cover in our presentation.
UE6858 Page 6
7. UE6858
“CAPTCHAS”
Why USE CAPTCHAS
Well to completely understand its usage one can consider this
story. Few years ago(November 99) www.Slashdot.org(a
popular site in US) conducted following poll on internet.
Now students at CMU and MIT instantly wrote a program
which increased their vote counts using software and
ultimately the poll had to be taken down because both
MIT and CMU had millions of votes while others
struggled to reach thousands.
There are situations like these where you need to
distinguish whether user is a machine or a computer.
This is where we use CAPTCHAS.
UE6858 Page 7
8. UE6858
“CAPTCHAS”
DEFINITIONS
CAPTCHA stands for
Completely Automated Public Turing test to tell Computers
and Humans Apart
A.K.A. Reverse Turing Test, Human Interaction Proof
Turing Test: to conduct this test two people and a machine is
needed here one person acts as an interrogator sitting in a
separate room asking questions and receiving responses and
goal of machine is to fool the interrogator.
The challenge here: develop a software program that can
create and grade challenges most humans can pass but
computers cannot.
UE6858 Page 8
9. UE6858
“CAPTCHAS”
Types of CAPTCHAS
There are basically 3 types of CAPTCHAS
1.Text Based: These are the most commonly used
CAPTCHAS. It can be further be divided into 3 parts:
GIMPY : Initially used by yahoo ,in this CAPTCHA two steps
are followed as:
a) Pick a word or words from a small dictionary
b) Distort them and add noise and background
GIMPY-R: This was used by google and was basically a simple
advance over gimpy. Here instead of a complete word
individual letters are noised instead of complete words. steps
followed are as
a) Pick random letters
b) Distort them, add noise and background
SIMARD’S: here further advances made and arcs being made
into it ie. Curved geometrical shapes. Hence steps followed are
as
a)Pick random letters and numbers
b)Distort them and add arcs
UE6858 Page 9
10. UE6858
“CAPTCHAS”
2. Graphic Based CAPTCHAS :These are based on graphics
ie. Images symbols and again is of two types:
Bongo
Following steps are followed in BONGO CAPTCHAS as:
a)Display two series of blocks
b)User must find the characteristic that sets the two series
apart
c)User is asked to determine which series each of four single
blocks belongs to.
PIX
This is the second kind of graphics CAPTCHA using distorted
images. Steps followed in its usage are as
a) Create a large database of labeled images
b) Pick a concrete object
c) Pick four images of the object from the images database
d) Distort the images
e) Ask the user to pick the object for a list of words
3.Audio Based CAPTCHAS:
These are based on humans ability to depict sounds that may
be distorted, following algorithm is followed in using it:
a) Pick a word or a sequence of numbers at random
b) Render them into an audio clip using a TTS software
UE6858 Page 10
11. UE6858
“CAPTCHAS”
c) Distort the audio clip
d) Ask the user to identify and type the word or numbers
MAJOR AREAS OF APPLICATIONS:
CAPTCHAs have several applications for practical security,
including (but not limited to):
• Preventing Comment Spam in Blogs. Most bloggers
are familiar with programs that submit bogus comments,
usually for the purpose of raising search engine ranks of
some website (e.g., "buy penny stocks here"). This is called
comment spam. By using a CAPTCHA, only humans can
enter comments on a blog. There is no need to make users
sign up before they enter a comment, and no legitimate
comments are ever lost!
• Protecting Website Registration. Several companies
(Yahoo!, Microsoft, etc.) offer free email services. Up until a
few years ago, most of these services suffered from a
specific type of attack: "bots" that would sign up for
thousands of email accounts every minute. The solution to
this problem was to use CAPTCHAs to ensure that only
humans obtain free accounts. In general, free services
should be protected with a CAPTCHA in order to prevent
abuse by automated scripts.
• Protecting Email Addresses From Scrapers.
Spammers crawl the Web in search of email addresses
posted in clear text. CAPTCHAs provide an effective
mechanism to hide your email address from Web scrapers.
The idea is to require users to solve a CAPTCHA before
showing your email address. A free and secure
UE6858 Page 11
12. UE6858
“CAPTCHAS”
implementation that uses CAPTCHAs to obfuscate an email
address can be found at reCAPTCHA MailHide.
• Online Polls. In November 1999, http://www.slashdot.org
released an online poll asking which was the best graduate
school in computer science (a dangerous question to ask
over the web!). As is the case with most online polls, IP
addresses of voters were recorded in order to prevent single
users from voting more than once. However, students at
Carnegie Mellon found a way to stuff the ballots using
programs that voted for CMU thousands of times. CMU's
score started growing rapidly. The next day, students at MIT
wrote their own program and the poll became a contest
between voting "bots." MIT finished with 21,156 votes,
Carnegie Mellon with 21,032 and every other school with
less than 1,000. Can the result of any online poll be trusted?
Not unless the poll ensures that only humans can vote.
• Preventing Dictionary Attacks. CAPTCHAs can also be
used to prevent dictionary attacks in password systems. The
idea is simple: prevent a computer from being able to iterate
through the entire space of passwords by requiring it to solve
a CAPTCHA after a certain number of unsuccessful logins.
This is better than the classic approach of locking an
account after a sequence of unsuccessful logins, since doing
so allows an attacker to lock accounts at will.
• Search Engine Bots. It is sometimes desirable to keep
webpages unindexed to prevent others from finding them
easily. There is an html tag to prevent search engine bots
from reading web pages. The tag, however, doesn't
guarantee that bots won't read a web page; it only serves to
UE6858 Page 12
13. UE6858
“CAPTCHAS”
say "no bots, please." Search engine bots, since they usually
belong to large companies, respect web pages that don't
want to allow them in. However, in order to truly guarantee
that bots won't enter a web site, CAPTCHAs are needed.
• Worms and Spam. CAPTCHAs also offer a plausible
solution against email worms and spam: "I will only accept
an email if I know there is a human behind the other
computer." A few companies are already marketing this idea
ReCAPTCHA
UE6858 Page 13
14. UE6858
“CAPTCHAS”
ReCAPTCHA is a free CAPTCHA service that helps to digitize
books, newspapers and old time radio shows
About 200 million CAPTCHAs are solved by humans around the
world every day. In each case, roughly ten seconds of human
time are being spent. Individually, that's not a lot of time, but in
aggregate these little puzzles consume more than 150,000 hours
of work each day. What if we could make positive use of this
human effort? ReCAPTCHA does exactly that by channeling the
effort spent solving CAPTCHAs online into "reading" books.
To archive human knowledge and to make information more
accessible to the world, multiple projects are currently digitizing
physical books that were written before the computer age. The
book pages are being photographically scanned, and then
transformed into text using "Optical Character Recognition"
(OCR). The transformation into text is useful because scanning a
book produces images, which are difficult to store on small
devices, expensive to download, and cannot be searched. The
problem is that OCR is not perfect.
ReCAPTCHA improves the process of digitizing books by sending
words that cannot be read by computers to the Web in the form of
CAPTCHAs for humans to decipher. More specifically, each word
that cannot be read correctly by OCR is placed on an image and
used as a CAPTCHA. This is possible because most OCR
programs alert you when a word cannot be read correctly.
But if a computer can't read such a CAPTCHA, how does the
system know the correct answer to the puzzle? Here's how: Each
new word that cannot be read correctly by OCR is given to a user
in conjunction with another word for which the answer is already
known. The user is then asked to read both words. If they solve
the one for which the answer is known, the system assumes their
answer is correct for the new one. The system then gives the new
image to a number of other people to determine, with higher
confidence, whether the original answer was correct
UE6858 Page 14
15. UE6858
“CAPTCHAS”
BREAKING OF CAPTCHAS
UE6858 Page 15
16. UE6858
“CAPTCHAS”
There are two methods used till now to break these CAPTCHAS one
uses decoding software’s which removes noise and other uses humans
1. Some text based CAPTCHAs have been broken by software which
has 3 properties as :
PreProcessing : Removal of background clutter and noise
Segmentation : Splitting the image into regions which each contain
a single character.
Classification: Identifying the character in each region
2. Other CAPTCHAs can be broken by streaming the tests for
unsuspecting users to solve.
UE6858 Page 16
17. UE6858
“CAPTCHAS”
New Proposed Approaches
This new approach is Very similar to PIX CAPTCHAS as discussed
earlier following are the steps followed in using it:
• Pick a concrete object
• Get 6 images at random from images.google.com that match the
object
• Distort the images
• Build a list of 100 words: 90 from a full dictionary, 10 from the
objects dictionary
• Prompt the user to pick the object from the list of words
• Make an HTTP call to images.google.com and search for the
object
• Screen scrape the result of 2-3 pages to get the list of images
• Pick 6 images at random
• Randomly distort both the images and their URLs before
displaying them
• Expire the CAPTCHA in 30-45 seconds
UE6858 Page 17
18. UE6858
“CAPTCHAS”
Benefits of this approach
• The database already exists and is public
• The database is constantly being updated and maintained
• Adding “concrete objects” to the dictionary is virtually
instantaneous
• Distortion prevents caching hacks
• Quick expiration limits streaming hacks
Drawbacks of this approach:
• Not accessible to people with disabilities (which is the case of
most CAPTCHAs)
• Relies on Google’s infrastructure
• Unlike CAPTCHAs using random letters and numbers, the number
of challenge words is limited.
UE6858 Page 18
19. UE6858
“CAPTCHAS”
Conclusion
1.CAPTCHAS are any software that distinguishes human and
machine.
2.Research in CAPTCHAS implies advancement in AI making
computers understand how human thinks.
3.Internet companies are making billions of dollars every year,
their security and services quality matters and so does the
advancement in CAPTCHA technology.
4.Different methods of CAPTCHAS are being studied but new
ideas like ReCAPTCHA using human time on internet is
amazing.
UE6858 Page 19