Sunteți pe pagina 1din 10

c c  c c

  c

 cc

c Completely Automated Public Turing test to tell Computers & Human Apart are
commonly used for determination of the end user as human or automated program. These are
primarily images consisting of noise along with content to be identified, which are provided to
the user who is expected to identify the content. The content from these images can be identified
by humans due to their vision capabilities but automated systems cannot distinguish the content
from noise & hence it provides for unwanted access by automated systems. To recognize this
content from the image a processor needs a CAPTCHA solver. The solver uses image processing
technique to process and distinguish the content from noise. The entire process executed by the
solver is called reverse engineering CAPTCHA. Since CAPTCHA¶s are of different, different
types of solvers employing various technique can be implemented using a variety of image
processing algorithms.

In this project we present idea of one such CAPTCHA solver that may be employed to
solve a generic set of text based visual CAPTCHA. The project proposes five steps to implement
the same. Each of these steps is proposed to use a combination of one or more computer graphics
based algorithms to process an output a simpler version of original image to be used in
consecutive steps, consequently outputting the recognized content from the image which may be
used to compare with the input provided by the user and declare success or failure.
Keywords: CAPTCHA, ANN.

a  

  
c c  c c
  c


 c

Completely Automated Public Turing test to tell Computers & Human Apart (CAPTCHA`s)
are a measure for increasing the security of the websites that are commonly used as a medium of
information & corporate exchange now-a-days. These enhance the reliability of any website.
These are mainly images containing characters, which constitute the content part, along with a
lot of background noise. These are dynamically generated and are often used for encryption
purposes and hence should be universally unique. That is once a CAPTCHA image is generated,
it should never be repeated again. CAPTCHA`s can be of mainly two types such as follows:

1.c Êisual CAPTCHA`s


2.c Audio CAPTCHA`s

Êisual CAPTCHA`s are simply images as mentioned above while audio CAPTCHA`s are
similar images which read out the characters from the image. Audio CAPTCHA`s are mainly
designed to facilitate the blind and the visually impaired to use the same websites despite the use
of CAPTCHA`s.

These CAPTCHA`s are of different types according to their types and backgrounds as well
as the noise added in each of them. These provide us security in the sense that the system can
differentiate between users and tell apart users from automated systems, where the latter can be
used to hack or retrieve personal information from different websites.

a  

  
c c  c c
  c

Fig 1: An example of visual CAPTCHA for Facebook

Fig 2: An example of audio CAPTCHA for passport.

a  

  
c c  c c
  c

Thus whenever CAPTCHA¶s are used, a program to break the visual CAPTCHA so as to
recognize the content form the noise is needed. Such a program is known as CAPTCHA solver.
As different types of CAPTCHA`s are available, different solvers according to these types are
build. These may employ computer graphics algorithms or artificial neural networks (ANN) for
the solving process.

The solving process mainly consists of the following 5 stages:

1.c Input
2.c Preprocessing
3.c Segmentation
4.c Feature extraction
5.c Pattern matching or classification

Whenever the user gives an input corresponding to a CAPTCHA, it is checked with the
characters recognized by the solver for the same CAPTCHA. If both of them match then the
input is accepted else it is rejected.

a  

  
c c  c c
  c

  c

Networking and the use of World Wide Web (WWW) has increased multiple folds in the
last few years. The internet is easily available these days at cheap rates and used for by all for
various purposes. Also, along with this, way if life has also changed. E-business along with
electronic transactions has led us to an era of improved technology.

Despite improvement in the technology, not all the means of information exchange are
safe. Also due to the use of this very technology, information is available at our fingertips, in
abundance, with considerable ease. Not all the information is put to use for good n ethical means.
The same may be misused by one and all.

Thus there is a need of security measures for the use of websites and other software so as
to provide a relatively safer environment to use the available sources. These security measures
should be hassle free so as to give the user a tension free and uncomplicated environment to
work in. CAPTCHA is once such security application which is used for this security.

The use of these different CAPTCHA`s and their importance in today¶s networking led
us to research and study them. A lot of work though implemented and researched in this area, a
lot still remains to be done. This gave us the idea to try and implement these hitherto
unimplemented areas in the above mentioned field. Also the possibility of usage of Artificial
Neural Networks has given us the jerk in the right direction to work in this domain.

a  

  
c c  c c
  c

m m 
c c
c
In this project we propose to develop a solver for text based visual CAPTCHA. The
solver is intended to constitute 5 steps. These stages and & their working can be summarized as
follows:

c c
The solver takes input in the form of an image containing certain characters and/or
numbers along with digital noise. This image is taken in the form of a standard image file
which is then used as an input by the later stages of the solver.

èc m  c
This step directly follows the input step and is the first stage where the actual processing
is done on the original input file. This stage mainly does the work or background noise
removal rendering the image in the binary format. This can be done stepwise, in which
case grayscale images are created before the actual creation of binary images. The
background noise may be of different types and thus different algorithms for removal of
each type of noise have to be implemented to provide versatility of noise removal.

rc c
This forms the second step in processing the input image and bringing it one step closer
to the output. This stage segments the input image into a number of glyphs such that each
one of them represents a single character or number. These glyphs can be obtained using
a range of segmentation algorithms readily available, according to the type of input.

‰c  c c


This stage thrives to bring about unity in the storage space for each character in a given
language. To attain this thinning algorithms are applied so as to maintain a glyph`s
information within a minimum possible amount of storage memory. Also it may employ

a  

  
c c  c c
  c

other techniques such as skeletonization and scaling to bring about uniformity of storage.
Also it computes a probability value for the glyph to be a certain character.

 c m cc


This is the last working stage of the proposed system. This stage takes as input the
probability acquired in the above stage, of each glyph created in the segmentation stage.
The probability of each glyph is separately compared to each of the letters stored in the
database. Accordingly the one with the highest probability is chosen as the recognized
letter. Alternatively thresholding can be done o decide upon the the cut off limit for
pattern matching.

c

c c c

a  

  
c c  c c
  c

mcccccccccccccccccccccccccccccccccccccccc c  c
ccccccccccc
ccccccccccccc cccccccccccc
 c
c ccc
Need Analysis- 21/06/10 30/06/10 40

Feasibility study- 01/07/10 07/07/10 22


Scope determination- 07/07/10 10/07/10 08
Literature survey- 11/07/10 24/07/10 32
Scripting determination- 25/07/10 26/07/10 05
Documentation- 26/07/10 30/07/10 12

c c
Functional requirements 01/08/10 02/08/10 08
Database design 03/07/10 24/08/10 80
Detail Design ± 26/08/10 23/09/10 80
Review- 24/09/10 26/09/10 16

m 
c !c
Task Tracking- 27/09/10 02/10/10 40

Status Reporting- 03/10/10 07/10/10 24


Change and Scope mgmt 08/10/10 14/10/10 40

 !c
Module coding 15/12/10 25/01/11 160
Unit testing 01/02/11 22/02/11 60
Test Data 23/02/11 28/02/11 40

Integration with other module 02/03/11 30/03/11 80

c
!c

a  

  
c c  c c
  c

lack ox Integration Testing 01/04/11 06/04/11 60

m !c
Presentation to internals 07/04/11 10/04/11 16

!c
Project documentation 11/04/11 27/04/11 30

c

a  

  
c c  c c
  c

1] Reverse Engineering CAPTCHAs by Abram Hindle, Michael W. Godfrey, Richard C. Holt
Software Architecture Group (SWAG). University of Waterloo, Waterloo, Ontario, CANADA

2] A Projection-based Segmentation Algorithm for reaking MSN and YAHOO CAPTCHAs
by Shih-Yu Huang, Yeuan-Kuen Lee, Graeme ell and Zhan-he Ou

3] µÊisual Character Recognition using Artificial Neural Networks by Shashank Araokar¶.

4] reaking visual captcha : A Novel Approach using HMM by Abhay ansal.cc

5] reaking visual CAPTCHA by G. Mori and Malik

6 ] reaking Êisual CAPTCHAs with Naïve Pattern Recognition Algorithms.

7] A Low-cost Attack on a Microsoft CAPTCHA

Jeff Yan, Ahmad Salah El Ahmad School of Computing Science, Newcastle University, UK

8] reaking Êisual CAPTCHAs with Naïve Pattern Recognition Algorithms by Jeff Yan,
Ahmad Salah El Ahmad School of Computing Science, Newcastle University, UK.

9] Using Machine Learning to reak Êisual Human Interaction Proofs (HIPs)c Kumar
Chellapilla Patrice Y. Simard

10] A note on the Nagendraprasad-Wang-Gupta thinning algorithm Rafael C. Carrasco and


Mikel L. Forcada

a  

  

S-ar putea să vă placă și