Sunteți pe pagina 1din 5

109

http://technicaljournals.org
Vol 03, Issue 01; January-April 2012 International Research Journal of Signal Processing - IRJSP

ISSN: 2249 6505

Handwriting analysis using graphology


NIAMATHULLA . SK
M.Tech in Real Time Systems, Aurora Technological and Research institute, Uppal, Hyderabad niamath407@gmail.com, ABSTRACT Graphology is an important field of writer identification/to the question document identification. It is applied to many types of investigation like judicial matters, fraud, suicide and others. It has been developed working in the offline-mode on a Spanish words image database formed by different individuals. All extraction characteristics carried out on the images have been used for writer identification. To make this computerized we considered different graphology features to identify the handwritten writing of specific writer among different writers from our database. In particular, we proposed a reliable and automated system that gives (or) determines whether document is dubitable (or) indubitable of the writer. A graphology feature that defines such a profile are measured in terms of document and after the extraction data is classified using WED and K-NN. Keywords: graphology, document, segmentation, pattern recognition, manuscript and WED. 1. INTRODUCTION Graphology is a study of handwriting (1871) and it helps in revealing the character and personality of the writer including his (or) her strength, weakness, and abilities. As handwriting comes from the unconscious, it contains a great deal of information, which can be useful for interpreting ones character. ELANE describes Grap is Brain writing, the handwriting comes directly from the writer in a uniquely personal and individual way, irrespective of how the person has been taught to write; an expert graphology understand the styles of the different countries and languages and make allowances for taught influences. This paper based on handwriting detection of the specific person among different writers from our database. It has great importance in the area of calligraphic known how and in judicial matters. For this the first step is to teach the computer what handwriting is and how segment it. Now-a-days, the present work is quasi-automated and has scare parameterization/features; therefore it require fined participation of a handwriting expert to analyse the obtained result and its advantages are the automation of expert analysis for manuscripts and the additional information that would be added to the professional verdict. As a consequence, now-a-days, it is necessary to create a reliable and automated system that gives expert report and find writer based on features (Skew, Slant, Pressure....etc). Based on this idea, in the present work we describe a writer recognition system that distinguishes with high degree of success, among different writers from our database. To identify the handwriting analysis of the specific writer there are various features, such as slant, skew, size, pressure, upper zone,(or) case, lower zone, word spacing, line spacing, page margins, middle zone, garland, angle, thread, wavy line and many others. But the proposed system has only 8-features among them discussed above. These features are size of letters, slant of words, and letters, baseline, pen pressure, spacing between letters and words as they enough to identify the person. The main attention of this paper is segmentation, feature extraction and classification. All the features are extracted automatically from the digital image of handwriting. These samples are then input to the WED for classification. The algorithm proposed is simple and easy to implement and MATLAB is the tool used for the same. 2. PROPOSED METHODOLOGY The main contribution of this paper is to define writer identification from samples images among different writer from our database. The identification has been carried out by the graphological features (or) by parameterization that should be found in the person handwriting. Here were many works in this proposed field [7, 8, 9,] which has three main steps: pre-processing, features extraction, and classification. Following the same to atomize the procedure the various steps needed to be followed as mentioned in figure 1: 2.1 Image handwriting acquisition & database: Handwriting image samples of different individuals are used in this research which is digitally collected by scanning the handwriting of 50 different writers. Each of them was asked to write a text document of simple 3040 words in running hand. Most of the handwritings are cursive but few of them are printed handwritings. The samples of our database were basically the following three: (i) It was always the same pen type used, (ball point with black ink), (ii) The same group of images from a restricted lexicon for all users and an images processing off-line, and

2011 - TECHNICALJOURNALS, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc.,

110
http://technicaljournals.org
Vol 03, Issue 01; January-April 2012 International Research Journal of Signal Processing - IRJSP

ISSN: 2249 6505

(iii) Each user wrote his samples on paper DIN-A4 of 80 grams per sheet, using a rigid support surface, for one week with discontinuous conditions; at different days of the week. With those discontinuous conditions, an effect of temporal invariance was obtained for our database.

Exit Fig (1) 2.2 Image handwriting pre-processing In pre-processing stage, the handwriting image is pre-processed to eliminate the noise salt-pepper, elimination of punction signs, and resizing the samples to the correct orientation.

2.3 Handwriting Segmentation In handwriting image segmentation digital handwriting is segmented into three different types of segments, i.e. word segmentation, letter segmentation and line segmentation, each used for different processing. 2.3.1 Word Segmentation This segmentation process is used to segment the words in digital handwriting document to calculate different features related to words. 2.3.2 Line Segmentation This line segmentation is used to find the base line features segmented images is then processed to calculate some numerical values mathematically to classify the writer of among different writers from our database.

2.3.3 Letter Segmentation In handwriting segmentation is performed on each letters in the word in digital handwriting document of each individual. This segmentation is used in feature calculation related to letter for identification of the specific writer. As shown in figure below:

2011 - TECHNICALJOURNALS, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc.,

111
http://technicaljournals.org
Vol 03, Issue 01; January-April 2012 International Research Journal of Signal Processing - IRJSP

ISSN: 2249 6505

2.4 Features Detection Feature extraction is a technique of dimensionality reduction from a high dimensional input data. This reduced output data is a transformation of the high dimension input data represented as feature vector. In our case, features are the six important factors on which identification has been carried out of specific writer. These features are explained in more details below.

2.4.1 Skew Angle and heights of three main handwriting zones Handwriting of a person can be defined by three main handwriting zones upper zone, middle zone and lower zone, as shown in figure. Hence, skew angle is the angle between the baseline and the x-axis. To compute the heights of the three main handwriting zones, the baseline is estimated using the method described in which is based on vertical histogram of pixels. However, this method works only for non-skewed text lines. Hence to estimate the baseline for skewed text lines, the angular histogram method is used instead. Therefore, the skew angle is computed by determining the best-fit line through the minima points. This line can

Represented in the form y=mx+c, Angled reference lines are then computed from the angular histogram at the skew angle. The heights of the three main writing zones can be calculated as

Where are the heights of the topline, upperline, baseline and bottom line respectively Ratios of these three heights are used as the features. 2.4.2 Width of the writing The average width of the writing which can be considered as a features is determined by finding the row with maximum transition from black pixel to white pixel and vice versa. In this row, the length of each run of white pixels between two black pixels is computed. Since these lengths include the interword spacing as well, the median of these lengths is taken, which gives the width of the writing. W = median (l1....li.....ln) 2.4.3 Slant Angle Slant of the handwriting can be defined as the angle of characters with the y-axis, as shown in figure which can be considered as a feature. For the slant estimation, a contour detection algorithm is applied. The contours are approximated by straight line segments. The gradient of each straight line corresponds to a column of a onedimensional histogram. The length of each line is taken as its weight when inserted in this histogram. Now we can either choose the peak of the histogram, or calculate the average slant angle of some of the highest columns as the slant of the writing.

2.4.4 Pressure The qualification of the pressure, defined as the width of the stroke on the force exercised by means of the scriptural useful (pen). We carried out several initial tests that provided scarce percentages in the rate of success

2011 - TECHNICALJOURNALS, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc.,

112
http://technicaljournals.org
Vol 03, Issue 01; January-April 2012 International Research Journal of Signal Processing - IRJSP

ISSN: 2249 6505

during the classification stage. Due to this, and in a completely independent way with the system that here intends, we decided to carry out a previous study performed by 10 samples of 4 different writers. The study consisted of measuring the pressure in function of the resolution adopted during the acquisition of the samples (scanned at 100, 150, 200, 300, 600 and 1200 dpi.), and to compare the estimate of the pressure like the mode at the same time in front of the pressure like the pondered mean of the widths of lines. In this study, we concluded that the pressure, like the statistical mode, and that the discriminate information is not extracted for high resolution, because in both type of approaches the highest success rate was reached for 150 dpi. Besides, this study was repeated 10 times; for this reason, the results are shown with mean and variance. Finally that experiment was done on Weighted Euclidean distance as classifier and where 50% of our database was used for the training mode, and the rest, for the test mode. 2.4.5 Correlation The correlation parameter between two images (A(x,y) and B(x,y)) of the same words was obtained applying the correlation and the convolution in two dimensions (see Equations 1 and 2, respectively), and obtaining the coordinate of the axis x of abscissa where the maximum similarity takes place. This measure allows us to know the grade of resemblance between two samples of the same work, taken at different times and belonging to the same writing body. 2.4.6 Union of letters The union of letters allows us to know how each writer makes the writing and was considered as a function of the connected components inside the word. This way allows us to discriminate against other writers, who write the letters of the word completely united, partially united, or completely divided. 2.4.7 Baseline The baseline in the handwritings can be calculated by using equation as shown below:

2.4.8 Thinning area The last parameter is the area of the words, which are usually measured by the handwriting experts, like the area contained by the minimum box of the word. Nevertheless, in this work, better discriminatory results were obtained with another procedure, which consists of calculating the area occupied by the foreground pixels of the thinning word. But before, the punctuation signs must be eliminated in a word. This process was done by thresholds. Obtaining a success rate of 99% on all our database. Later, that word suffers three processes of image processing: thinning, dilation, and again, thinning. 3. CLASSIFICATION SYSTEM Here two classifiers we considered, namely the weighted Euclidean distance (WED) classifier and the nearest neighbour classifier (K-NN). 3.1 Weighted Euclidean Distance (WED) Classifier Representative features for each writer are determined from the features extracted from training handwriting texts of the writer. Then, for an unseen handwritten text block by an unknown writer (who has contributed training images), similar feature extraction operations are carried out. The extracted features are then compared with the representative features of a set of known writers. The writer of the handwriting is indentified as writer K by the WED classifier iff the following distance function is a minimum at K:

Where fn is the nth feature of the input document, and Fn(k) and Vn(k) are the sample mean and sample standard deviation of the nth feature of writer K respectively. 3.2 The Nearest Neighbour Classifier (K-NN) When using the nearest neighbour classifier (K-NN), for each class K in the training set the ideal feature vectors is given as fk Then we detect and measure the features of unknown writer represented as U. To determine the class K of the writer we measure the similarity with each class by computing the distance between the feature vectors fk and U. The distance measure used here is Euclidean distance. Then the distance computed dk of the unknown writer from class is given by

Where j = 1 ...N (N is the number of the features considered). The writer is then assigned to the class R such that:

2011 - TECHNICALJOURNALS, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc.,

113
http://technicaljournals.org
Vol 03, Issue 01; January-April 2012 International Research Journal of Signal Processing - IRJSP

ISSN: 2249 6505

Where (K=1... no of classes). Other more sophisticated measures and classifiers such as neural network classifiers could have been used. The emphasis in this paper, however, is computational simplicity. 4. EXPERIMENTS AND RESULTS We conducted the experiments in two steps. The first steps A, uses two-third of data samples as training data and one-third as the test data. And the second step B, involved one data as a test data and the rests where used to train the system. The result in both the cases is near about same with the difference of 3 %. In first step, each 40 writers where asked to write the test documents and were saved in digital form. Then each individuals features an alysis was made. The two-third of the handwriting data samples and its features results ar used to learn the system and the rest one-third handwriting data samples are tested for the accuracy. it results in 90.3% of accuracy. While in the second step more or less we use the same technique to the sample as the test sample. In this case the accuracy graph jumped from 90.3% to 95.07% which is very good result.

S.No Doc. ID Suspected Doc(S) 1 09 S 2 10 S 3 13 S 4 16 S 5 18 S

Distance Accuracy 0.45 88.5 0.92 91.1 0.77 90.0 0 95.6 0.10 85.4

5. CONCLUSION In this work, the problem of Writer identification based on handwriting has been studied. Unlike signature verification and other offline writer identification approaches this approach is next independent. Features are extracted from text lines that characterize a writer and then used for identification. ACKNOWLEDGMENT This work was supported by an investigation scholarship Catedra Telefonica - ULPGC provided by the Spanish telephony operator Telefonica in the call 2007 research works. 6. REFERENCES [1] Carlos F. Romero, Carlos M. Travieso, jesus B. Alonso And Miguel A. Ferrer, Using off-line handwritten text for writer identification WSEAS Transactions on Signal Processing Issue 1, Vol 3, pp, 56-61, January 2007. [2] D. Valkaniotis, j. Sirigos, N.Antoniales and N.Fakotakis, Text-independent off-line writer Recogniton using NNs.ICESC 96, p. 692-695. [3] C. Bishop, Neural Networks for pattern Recognition, Clarendon, UK: Oxford University Press. 1996. [4] Jain Anil, Bolle Ruud and Sarta Pankanti, Biometrics, Personal Identification in Networked Society, Kluwer Academic Publishers, 1999. [5] Lambert Schomaker Writer identification and Verification, Grote Kruisstr. 2/1, 9712 TS, Groningen, Netherlands. [6] Graphology Handwriting Analysis, http://www.businessballs.com/graphologyhandwritinganalysis.htm [7] Ricard Coll, Alicia Fornes, Josep Llados, Graphological Analysis of Handwritten Text Documents for Human Resources Recruitment, Universitat Autonoma de Barcelona, Spain. [8] S.N. Srihari, S. Cha, H. Arora, and S. Lee, Individuality of handwriting, Journals of Forensic Science, 47(4), 2002. [9] A. Vinciarelli and J. Luettin, A new normalization technique for cursive handwritten words, Pattern Recognition Letters, 22, 2001. [10] R. Gardner, Instant Handwriting Analysis: A key to personal Success. Llewellyn Publications, 2002. [11] Omar Santana, Carlos M. Travieso, Writer Identification Based on Graphology Techniques, Universidad de Las Palmas de Gran Canaria. AUTHOR BIOGRAPHY I am Niamathulla SK. Student, University of JNTU, at Aurora Technological and Research Institute, Uppal, Hyderabad. I worked as Assistant Professor and interested in performing various projects, paper presentations.

2011 - TECHNICALJOURNALS, Peer Reviewed International Journals-IJCEA, IJESR, RJCSE, PAPER, ERL, IRJMWC, IRJSP, IJEEAR, IJCEAR, IJMEAR, ICEAR, IJVES, IJGET, IJBEST TJ-PBPC, India; Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat, DOAJ, and other major databases etc.,

S-ar putea să vă placă și