Sunteți pe pagina 1din 36

Captcha In Web Security : Secure or Not ?

Presented By Abhishek Sharma (08CE04)

How CAPTCHA Looks Like ?

CAPTCHA Used By Google

CAPTCHA : The Acronym

Completely Automated Public Turing Test to Tell Computers and Humans Apart

CAPTCHA : Literal Meaning

Completely : Whole

Automated :
Public :

Made by Machine
Universally Known

Turing Test to Tell :Test Presented

by Alan Turing Computers and Humans Apart

Introduction History The Need of CAPTCHA Basic Terminologies Earlier CAPTCHAs How does a CAPTCHA work? Types of CAPTCHA Implementation of CAPTCHA Can CAPTCHA be broken? CAPTCHA Guidelines Applications Benefits of CAPTCHA Limitations of CAPTCHA Conclusion

A CAPTCHA is a type of Challenge-response test used in computing as an attempt to ensure that the response is generated by a person or by some other Computer.

It is needed because activities such as online commerce transactions, search engine submissions, Web polls, Web registrations, free e-mail service registration and other automated services are subject to software programs, or bots.

CAPTCHA : History
1997: Andrei Broder at AltaVista wanted to prevent bots from automatically submitting sites for indexing.
He decided to add a test to the submission page. He reversed Brother scanner OCR optimization techniques.

2000: Luis von Ahn, Manuel Blum & John Langford at CMU trademarked CAPTCHA.
Yahoo partnered CMU to counter these threats in Messenger chat service.

CAPTCHA : The Basic Needs

In 1999, issued an online poll asking users to pick the best computer science school in the US. Students at MIT and Carnegie Mellon University created voting bots to vote for their school multiple times

MIT finished with 21156 votes and Carnegie Mellon Finished with 21032 votes.
All other schools finished with less than 1000 votes. Proved that online polls could not be trusted unless they ensured that only humans could vote. In September 2000, Yahoo! reported that bots were entering their online chat rooms & pointing legitimate users to advertising sites.

CAPTCHA : The Basic Needs

Yahoo! turned to CMU to help them solve their problem. Luis von Ahn, Manual Blum, Nicholas Harper, and John Langford developed CAPTCHA. They determined that CAPTCHAs should : 1. Present challenges that are automatically generated and graded. 2. Be simple enough to be taken quickly and easily by humans. 3. Accept virtually all human users and reject few. 4. Reject virtually all machine users. 5. Resist automatic attacks for many years to come.

CAPTCHA : Terminologies
Turing Test

Challenge Response Test


Terminologies : BOTS
A bot is a software program on the Internet. It is a software agent that interact with other network services intended for people as if it was a real person. Types of Bot :1.Voting Bots 2.Email Account Registration Bots

3.Email Spam Bots

Terminologies : Turing Test

A mathematician, Alan Turing imagined a game in which three players played it. One is interrogator, who had to find out that which one is the machine. What is a Turing test?
To test a machines level of intelligence Human judge asks questions to two participants, one is a machine, he doesnt know which is which If judge cant tell which is the machine, the machine passes the test CAPTCHA employs a reverse Turing test, judge = CAPTCHA program, participant = user if user passes CAPTCHA, he is human if user fails, it is a machine

Challenge Response Test & Spam

A challenge-response test is a test involving a set of questions (or "challenges"), that the person or other entity has to answer in order to pass the test. If the person or entity provides an adequate response to the challenges, then it is seemed that this person or entity has passed the test.

Terminologies :

What is Challenge Response Test ?

What is SPAM ?
Spamming is the act of sending unwanted electronic messages in bulk. In the popular eye, the most common form of spam is that delivered in e-mail as a form of commercial advertising. Sending bulk messages in this fashion, to recipients who have not desired them, has come to be known as spamming, and the messages themselves as spam.

CAPTCHA : Earlier Design

Gimpy: A puzzle consists of a display of ten distorted and overlapping words chosen at random from a dictionary of simple words Solving the puzzle requires to identify only three of the ten words and to type them into the box provided. It looks Like below figure.

CAPTCHA : How does It works ?

A CAPTCHA image is generated randomly on the web page from the stored database that have two attributes: the one is for the image and the other one is for the key associated to that image. When the user has entered the letters in the textbox provided then these letters are matched with the secret key. If the key is matched then the user is redirected to the next page else the new CAPTCHA image will displayed and the same process is repeated.

CAPTCHA : Different Types of CAPTCHAs

Text Based CAPTCHA Graphic Based CAPTCHA Gimpy CAPTCHA E-Z Gimpy CAPTCHA Audio Based CAPTCHA reCAPTCHA and book digitalization

Text Based CAPTCHA

Simple, normal language questions: What is sum of three and thirty-five? If today is Saturday, what is day after tomorrow? Which of mango, table, water is a fruit? Very effective, needs a large question bank Cognitively chalenged users find it hard.


Types of Text Based CAPTCHA Printed CAPTCHA H-CAPTCHA

Text Based CAPTCHA :

Printed CAPTCHA is difficult to break Lots of algorithms are available to generate these Humans cannot identify these very easily Two major types are there viz. Baffle text, Pessimal print.


Baffle Text Based CAPTCHA

Developed by Monica Chew and Henry Baird Uses pronounceable English characters with masking that are not present in English dictionary

Pessimal Print Image CAPTCHA

Developed by Allison Coates and Henry Baird and Richard Fateman Uses the degradation model simulating physical defects caused by printing and scanning of printed text

Graphic Based CAPTCHA


1. A visual recognition problem. 2. Two sets of shapes with a distinguishing characteristic. 3. Must choose which set the shape belongs to.

A database of labeled images of recognizable objects Randomly chooses an object and displays N pictures of it.

Must correctly identify the object.

Pictures are distorted. Image based captcha .

Designed by Yahoo and CMU. Picks up 10 random words from dictionary and distorts, fills with noise. User has to recognize at least 3 words. If user is correct, he is admitted. Below is a Example of Gimpy.



A modified version of Gimpy. Yahoo used this version in Messenger. Has only 1 random string of characters. Not a dictionary word, so not prone to dictionary attack. Not a good implementation, already broken by OCRs.


Audio Based CAPTCHA

Consist of downloadable audio clip User listens and enters the spoken word Helps visually disabled users Below is the Googles audio enabled CAPTCHA Not popular


reCAPTCHA & Book Digitalization

Verify digitized books: reCAPTCHA Used in Google Books Project Two words are shown, the program knows first word If user enters first word correctly, it assumes that the second unknown word will also be entered correctly Second word becomes known


Implementation & Creation

Creating CAPTCHA in Different Fashion
1. One way to create a CAPTCHA is to pre-determine the images and solutions it will use. This approach requires a database that includes all the CAPTCHA solutions, which can compromise the reliability of the test. 2. A CAPTCHA can be created using a Image and some characters by applying some effects on them like blurring, distortion etc. 3. One can make His/her Own CAPTCHA for a web forum by using some randomize function in which Some sort of strings are generated randomly. 4. a CAPTCHA might include series of shapes and ask the user which shape among several choices would logically come next. The problem with this approach is that not all humans are good with these kinds of problems and the success rate for a human user can go below 80 percent.



There are two basic Implementation of CAPTCHA for a Website or Web Forum.
1. Embeddable CAPTCHAs : The easiest implementation of a
CAPTCHA to a Website would be to insert a few lines of CAPTCHA code into the Websites HTML code, from an open source CAPTCHA builder, which will provide the authentication services remotely. Most such services are free. Popular among them is the service provided by s reCAPTCHA project.

2. Custom CAPTCHAs: These are less popular because of the

extra work needed to create a secure implementation. Anyway, these are popular among researchers who verify existing CAPTCHAs and suggest alternative implementations.

Can CAPTCHA be broken ?


The answer to this question is: YES! Given enough effort, absolutely every CAPTCHA algorithm can be broken.

Breaking A CAPTCHA


A very Popular method used for breaking a CAPTCHA is OCR(Optical Character Recognition). Most text based CAPTCHAs have been broken by software Computer Character Recognition. Other CAPTCHAs were broken by screaming the tests for unsuspecting users to solve.

Computer Character Recognition

Breaking A CAPTCHA :

A number of research projects have attempted (often with success) to beat visual CAPTCHAs by creating programs that contain the following functionality:

1.Pre processing 2.Segmentation 3.Classification

Computer Character Recognition :

Application of algorithms to remove the effects of distortion, blurring, clutter, background noise, etc. Easy problem for computers to solve.

Step By Step Process

Splitting the image into regions which contain a single character. Complex and computationally expensive.

Character Recognition
OCR software used to identify the characters

Guidelines For CAPTCHA

All users need to have access to the protected site. For example, visually-impaired users need audio CAPTCHAs.


Image Security
Images must be secure enough to prevent OCR-based attacks. Random and thorough distortion techniques.

Script Security
Programs must be secure as well. Passwords passed in encrypted text. Destroy sessions after a CAPTCHA is solved.

Security After Widespread Adoption

Large pool of dictionary or words or images. Phonetic generators and nonsense words.

Guidelines For CAPTCHA

Security from OCR is achieved by randomness:


Making the letters wiggly:

Adding noise or lines: Using a messy background: Crowding or blending letters: Segmenting characters: Varying font thickness, color:

Applications Of CAPTCHA


1. 2. 3. 4. 5. 6. 7. 8. 9.

Online Polls Protecting Web Registration: Preventing comment spam Search engine bots E-Ticketing Email spam Preventing Dictionary Attacks As a tool to verify digitized books Improve Artificial Intelligence (AI) technology

Benefits of CAPTCHA


Using a CAPTCHA significantly narrows the number of potential attackers on your website. CAPTCHA images ensure that not every beginner hacker can attack your web forms.
You can always change the algorithm used if the previous one is broken. It's highly unlikely that a hacker will spend his entire time trying to break new algorithms as you change them.

Limitations of CAPTCHA


CAPTCHA is not 100% solution for all the problems like BOTs and Spams. CAPTCHA can be broken. 1. Using Computer Character Recognition software. 2. Using cheap human labor to process the test.



As with all security solutions, risk can only be decreased, but there is no such thing as a single security measure that is 100% safe. But the presence of a CAPTCHA is always necessary when you need to enhance the stability and security of any web service or application. So a CAPTCHA is a technique that can generate and grade that : A human can pass very easily but its not so easy for any computer or software program.