Documente Academic
Documente Profesional
Documente Cultură
1.1 INTRODUCTION
Full text recognition is in most cases not yet available, except for printed
documents for which dedicated OCR can be developed. However, invaluable
collections of historical documents are already digitized and indexed for
consulting, exchange and distant access purposes which protect them from
direct manipulation. In some cases, highly structured editions have been
established by scholars. But a huge amount of documents are still to be exploited
electronically. To produce an electronic searchable form, a document has to be
indexed. The simplest way of indexing a document consists in attaching its main
characteristics such as date, place and author (the so called ‘metadata’).
Indexing can be enhanced when the document structure and content are
exploited. When a transcription (published version, diplomatic transcription) is
available, it can be attached to the digitized document: this allows users to
retrieve documents from textual queries. Since text based representations do not
reflect the graphical features of such documents, a better representation is
obtained by linking the transcription to the document image. A direct
correspondence can then be established between the document image and its
content by text/image alignment techniques .This allows the creation of indexes
where the position of each word can be recorded, and of links between both
representations.
The purpose of this project is to survey the efforts made for historical
documents on the text line segmentation task. Section 2 describes the
characteristics of text line structures in historical documents and the different
ways of defining a text line. Preprocessing of document images (gray level, color
or black and white) is often necessary before text line extracting to prune
superfluous information (non textual elements, textual elements from the verso)
or to correctly binaries the image.
Baseline: Fictitious line which follows and joins the lower part of the character
bodies in a text line (Fig. 2)
Median line: Fictitious line which follows and joins the upper part of the
character bodies in a text line.
Line spacing: Lines that are rather widely spaced lines are easy to find. The
process of extracting text lines grows more difficult as interlines are narrowing
the lower baseline of the first line is becoming closer to the upper baseline of the
second line; also, descenders and ascenders start to fill the blank space left for
separating two adjacent text lines .
Insertions: Words or short text lines may appear between the principal text
lines, or in the margins.
Stroke fragmentation and merging: Punctuation, dots and broken strokes due
to low-quality images and/or binarization may produce many connected
components; conversely, words, characters and strokes may be split into several
connected components. The broken components are no longer linked to the
median baseline of the writing and become ambiguous and hard to segment into
the correct text line.
Separating paths and delimited strip: Separating lines (or paths) are
continuous fictitious lines which can be uniformly straight, made of straight
segments, or of curving joined strokes. The delimited strip between two
consecutive separating lines receives the same text line label. So the text line
can be represented by a strip with its couple of separating lines (Fig. 3).
Clusters: Clusters are a general set-based way of defining text lines. A label is
associated with each cluster. Units within the same cluster belong to the same
text line. They may be pixels, connected components, or blocks enclosing pieces
of writing. A text line can be represented by a list of units with the same label.
Strings: Strings are lists of spatially aligned and ordered units. Each string
represents one text line.
Baselines: Baselines follow line fluctuations but partially define a text line. Units
connected to a baseline are assumed to belong to it. Complementary processing
has to be done to cluster non-connected units and touching components.
Fig. 3 Various text line representations: paths, strings and baselines.
1.3 DOCUMENT IMAGE ANALYSIS
(1) Typical documents in today’s office are computer-generated, but even so,
inevitably by different computers and software such that even their electronic
formats are incompatible. Some include formatted text and tables as well as
handwritten entries. There are different sizes, from a business card to a large
engineering drawing. Document analysis systems recognize types of documents,
enable the extraction of their functional parts, and translate from one computer
generated format to another.
However, OCR was still in its infancy at the time and did not perform as
acceptably as MICR. The advantage of MICR was that it is relatively impervious
to change, fraudulent alteration and interference from non-MlCR inks. The "eye''
of early OCR equipment utilized lights, mirrors, fixed slits for the reflected light to
pass through, and a moving disk with additional slits. The reflected image was
broken into discrete bits of black and white data, presented to a photo-multiplier
tube, and converted to electronic bits.
OCR has never achieved a read rate that is 100% perfect. Because of
this, a system which permits rapid and accurate correction of rejects is a major
requirement. Exception item processing is always a problem because it delays
the completion of the job entry, particularly the balancing function. Of even
greater concern is the problem of misreading a character (substitutions). In
particular, if the system does not accurately balance dollar data, customer
dissatisfaction will occur. The success of any OCR device to read accurately
without substitutions is not the sole responsibility of the hardware manufacturer.
Much depends on the quality of the items to be processed.
Through the years, the desire has been to increase the accuracy of
reading, that is, to reduce rejects and substitutions to reduce the sensitivity of
scanning to read less-controlled input to eliminate the need for specially
designed fonts (characters), and to read handwritten characters. However,
today's systems, while much more forgiving of printing quality and more accurate
than earlier equipment, still work best when specially designed characters are
used and attention to printing quality is maintained. However, these limits are not
objectionable to most applications, and dedicated users of OCR systems are
growing each year. But the ability to read a special character is not, by itself,
Sufficient to create a successful system.
2. Input was a carbon imprinted document. However, if the carbon was wrinkled,
the imprinter was misaligned, or any one of a variety of reasons existed, the
imprinted characters were impossible to read accurately.
3. To compensate for this problem, the processing system permitted direct key
entry of the fail to read items at a fairly high speed. Directly keyed items from the
misread document were under intelligent computer control which placed the
proper data in the right location for the data record. Important considerations in
designing the system encouraged the use of modulus controlled check digits for
the embossed credit card account number. This, coupled with tight monetary
controls by batch totals, reduced the chance of read substitutions.
4. The output of these early systems provided a "country club" type of billing.
That
is, each of the credit card sales slips was returned to the original purchaser. This
provided the credit card customer with the opportunity to review his own
Purchases to insure the final accuracy of billing. This has been a very successful
operation through the years. Today's systems improve the process by increasing
the amount of data to be read, either directly or through reproduction of details on
the sales draft. This provides customers with a "descriptive" billing statement
which itemizes each transaction. Attention to the details of each application step
is a requirement for successful OCR systems.
PREPROCESSING
SEGMENTATION
RECOGNITION
POST PROCESSING
1.4.6.1 PREPROCESSING
1.4.6.3 RECOGNITION
All OCR systems include an optical scanner for reading text, and
sophisticated software for analyzing images. Most OCR systems use a
combination of hardware (specialized circuit boards) and software to recognize
characters, although some inexpensive systems do it entirely through software.
Advanced OCR systems can read text in large variety of fonts, but they still have
difficulty with handwritten text.
The potential of OCR systems is enormous because they enable users to
harness the power of computers to access printed documents. OCR is already
being used widely in the legal profession, where searches that once required
hours or days can now be accomplished in a few seconds.
An OCR engine outputs not only candidate characters, but also candidate
distance information of each candidate character, which is also important in OCR
post-processing. Currently, candidate distance is usually transformed to reliability
of the corresponding candidate character to be utilized. Generally speaking, the
bigger the reliability of a candidate character, the smaller the corresponding
candidate distance. In early period, the reliability was calculated by using some
empirical formulas. Afterwards, a statistical approach was proposed, which
calculates the reliability according to the distribution of candidate characters and
correct characters with different candidate distances. It reflects some statistical
characteristics, and its complexity is low, therefore it achieves good results in
some applications. However, the use of candidate distance is still limited in OCR
post-processing.
1.4.7 STEPS INVOLVED IN OCR
• Binarization,
• Noise removing,
• Thinning,
• Skew detection and correction,
• Line segmentation,
• Word segmentation, and
• Character segmentation
Recognition consists of
• Feature extraction,
• Feature selection, and
• Classification
Fig 6 Steps in an OCR
1.4.6.1 Binarization
(A)
(B)
1.4.6.3 Thinning
Fig 8 A character image (left) before thinning, and (b) after thinning
1.4.6.4 Skew detection and correction
Fig 9 An image (a) with skew, (b) without skew, and its horizontal profiles
There exist many techniques for skew estimation. One skew estimation
technique is based on the projection profile of the document; another class of
approach is based on nearest neighbor clustering of connected components.
Techniques based on the Hough transform and Fourier transform are also
employed for skew estimation. A popular method for skew detection employs the
projection profile. A horizontal projection profile is a one-dimensional array where
each element denotes the number of black pixels along a row in the image.
Span horizontally, the horizontal projection profile has peaks whose widths
are equal to the character height and valleys whose widths are equal to the
spacing between lines. At the correct skew angle, since scan lines are aligned to
text lines, the projection profile has maximum height peaks for text and valleys
for line spacing. In the image of figure 9(a), its horizontal projection profile can be
seen with no clear valleys due to the presence of skew. Figure 9(b) is an image
in which the skew is removed. The peaks and valleys in the projection profile can
be clearly seen.
After the tilt is corrected, the text has to be segmented first into lines; each
line then into words and finally each word have to be segmented into its
constituted characters. Horizontal projection of a document image is most
commonly employed to extract the lines from the document. If the lines are well
separated, and are not tilted, the horizontal projection will have separated peaks
and valleys, as shown in figure 9(b), which serve as the separators of the text
lines.. Figure 9 shows an image consisting of 3 text lines (left), and the 3
segmented lines (right), using horizontal projection profiles.
Similarly a vertical projection profile gives the column sums. One can
separate lines by looking for minima in horizontal projection profile of the page
and then separate words by looking at minima in vertical projection profile of a
single line. Figure 11(a) shows a line consisting of 4 words, along with vertical
projection profiles, and figure 11(b) shows the 4 words, after segmentation. In
Figure 11(c), a word is shown segmented into its constituting 3 characters.
Overlapping, adjacent characters in a word (called kerned characters) cannot be
segmented using zero-valued valleys in the vertical projection profile.
Any OCR contains more or less the same steps described further. The
exact number and techniques differ slightly from one language to other. We now
present the studies in different OCRs, along with a detailed description of the
methods used in them. Recognition of isolated and continuous printed multi font
Bengali characters is reported in the work by Mahmud et al (2003).
The chain codes from center pixel are 0 for east, 1 for North- East, and so
on. This is represented pictorially in figure 12(a) and (b). Chain code gives the
boundary of the character image; slope distribution of chain code implies the
curvature properties of the character. In this work, connected components from
each character are divided into four regions with the center of mass of as the
origin. Slope distribution of chain code, in these four regions is used as local
feature. Using chain code representation, classification is done by a feed forward
neural network.
Text lines are partitioned into three zones and the horizontal and vertical
projection profiles are used to segment the text into lines, words, and characters.
Primary grouping of characters into the basic, modified and compound
characters is made before the actual classification. A few stroke features are
used for this purpose along with a tree classifier where the decision at each node
of the tree is taken on the basis of presence/absence of a particular feature.
The compound character recognition is done in - two stages
1) In the first stage the characters are grouped into small sub-sets by the
above tree classifier.
Short lines will provide low peaks, and very narrow lines, as
well as those including many overlapping components will not produce significant
peaks. In case of skew or moderate fluctuations of the text lines, the image may
be divided into vertical strips and profiles sought inside each strip. These
piecewise projections are thus a means of adapting to local fluctuations within a
more global scheme.
For printed and binarized documents, smearing methods such as the Run-
Length Smoothing Algorithm can be applied. Consecutive black pixels along the
horizontal direction are smeared: i.e. the white space between them is filled with
black pixels if their distance is within a predefined threshold. The bounding boxes
of the connected components in the smeared image enclose text lines.
Text line patterns are found in the work of Shi and Govindaraju by building
a fuzzy run length matrix. At each pixel, the fuzzy run-length is the maximal
extent of the background along the horizontal direction. Some foreground pixels
may be skipped if their number does not exceed a predefined value. This matrix
is threshold to make pieces of text lines appear without ascenders and
descenders (Fig. 14). Parameters have to be accurately and dynamically tuned.
Hence, these methods include one or several quality measures which ensure
that the text line under construction is of good quality. When comparing the
quality measures of two alignments in conflict, the alignment of lower quality can
be discarded (Fig.9). Also, during the grouping process, it is possible to choose
between the different units that can be aggregated within the same neighborhood
by evaluating the quality of each of the so-formed alignments.
Quality measures generally include the strength of the alignment, i.e. the
number of units included. Other quality elements may concern component size,
component spacing, or a measure of the alignment’s straightness.
Fig. 17 Hypothesized cells (ρ0, θ0) and (ρ1, θ1) in Hough space. Each peak
corresponds to perfectly aligned units. An alignment is composed of units
belonging to a cluster of cells (the cell structure) around a primary cell.
A cell structure of a cell (ρ, θ) includes all the cells lying in a cluster
centered on (ρ, θ). Consider the cell (ρ0, θ0) having the greatest count of units. A
second hypothesis (ρ1, θ1) is searched in the cell structure of (ρ0, θ0). The
alignment chosen between these two hypotheses is the strongest one, i.e. the
one which includes the highest number of units in its cell structure. And the
corresponding cell (ρ0, θ0) or (ρ1, θ1) is the primary cell (Fig. 17). However,
actual text lines rarely correspond to alignments with the highest number of units
as crossing alignments (from top to bottom for writing in horizontal direction)
must contain more units than actual text lines.
This small top reservoir also helps in touching character detection and
segmentation. All reservoirs are not considered for future processing. Reservoirs
having heights greater than a threshold T1 are selected for future use. For a
component the value of T1 is chosen as 1/9 times the component height. (The
threshold is determined from experiment.) We now discuss here some terms
relating to water reservoirs that will be used in feature extraction.
Overlapping components are the main challenges for text line extractions
since no white space is left between lines. Some of the methods surveyed above
do not need to detect such components because they extract only baselines , or
because in the method itself some criteria make paths avoid crossing black
pixels. This section only deals with methods where ambiguous components
(overlapping) are actually detected before, during or after text line segmentation
Such criteria as component size, the fact that the component belongs to several
alignments, or on the contrary to no alignment, can be used for detecting
ambiguous components.
Some solutions for separation of units belonging to several text lines can
be found also in the case of mail pieces and handwritten databases where efforts
have been made for recognition purposes. In the work of separation is made
from the skeleton of touching characters and the use of a dictionary of possible
touching configurations (Fig. 23). In Bruzzone and Coffetti, the contact point
between ambiguous strokes is detected and processed from their external
border.
An accurate analysis of the contour near the contact point is performed in
order to separate the strokes according to two registered configurations: a loop in
contact with a stroke, or two loops in contact. In simple cases of handwritten
pages the center of gravity of the connected component is used either to
associate the component to the current line or to the following line, or to cut the
component into two parts. This works well if the component is a single character.
It may fail if the component is a word, or part of a word, or even several words.
CHAPTER - 3
3.2 PROPOSED METHOD
The global horizontal projection method computes the sum of all black
pixels on every row and constructs the corresponding histogram. Based on the
peak/valley points of the histogram, individual lines are generally segmented.
Although this global orizontal projection method is applicable for line
segmentation of printed documents, it cannot be used in unconstrained
handwritten documents because the characters of two Consecutive text-lines
may touch or overlap. For example, see the 4th and 5th text lines of the
document shown in figure 26 a.
Figure 26 (a) N-stripes and PSL lines in each stripe are shown for a sample
of handwritten text. (b) Potential PSLs of figure 26 (a) are shown.
This is done for all stripes. We compute the statistical mode (MPSL) of such
distances. If the distance between any two consecutive PSLs of a stripe is less
than MPSL, we remove the upper PSL of these two PSLs. PSLs obtained after this
removal is the potential PSLs. The potential PSLs obtained from the PSLs of
figure 26 a are shown in figure 26b. We note the left and right co-ordinates of
each potential PSL for future use. By proper joining of these potential PSLs, we
get individual text lines. It may be noted that sometimes because of overlapping
or touching of one component of the upper line with a component of the lower
line, we may not get PSLs in some regions. Also, because of some modified
characters of telugu we find some extra PSLs in a stripe. We take care of them
during PSL joining, as explained next. Joining of PSLs is done in two steps.
In the first step, we join PSLs from right to left and, in the second step, we first
check whether line-wise PSL joining is complete or not. If for a line it is not
complete, joining from left to right is done to obtain complete segmentation. We
say PSLs joining of a line is complete if the length of the joined PSLs is equal to
the column (width) of the document image. This two-step approach is done to get
good results even if two consecutive text lines are overlapping or connected.
To join a PSL of the ith stripe, say i , to a PSL of (i − 1)th stripe, we check
whether any PSL, whose normal distance from Ki is less than MPSL,, exists or
not in the (i − 1) stripe. If it exists, we join the left co-ordinate of Ki with the right
co-ordinate of the PSL in the (i −1)th stripe. If it does not exist, we extend the Ki
horizontally in the left direction until it reaches the left boundary of the (i − 1)th
stripe or intersects a black pixel of any component in the (i − 1)th stripe. If the
extended part intersects the black pixel of a component of the (i − 1)th stripe, we
decide the “belongingness” of the component in the upper line or lower line.
Based on the belongingness of this component, we extend this line in such a way
that the component falls in its actual line. Belongingness of a component is
decided as follows.
We compute the distances from the intersecting point to the topmost and
bottommost point of the component. Let d1 be the top distance and d2 the
bottom distance.
If d1 < d2 and d1 < (MPSL/2) then the component belongs to the lower line.
If d2 ≤ d1 and d2 < (MPSL/2) then the component belongs to the upper line.
If d1 > (MPSL/2) and d2 > (MPSL/2) then we assume the component
touches another component of the lower line.
If the component belongs to the upper-line (lower-line) then the line is
extended following the contour of the lower part (upper part) of the component so
that the component can be included in the upper line (lower line).
The line extension is done until it reaches the left boundary of the (i −1)th
stripe. If the component is touching, we detect possible touching points based on the
structural shape of the touching component. From the experiment, we notice that in
most of the touching cases there exist junction/crossing shapes or there exist some
obstacle points in the middle portion having low black pixel density of the touching
component. These obstacle points and the junction/crossing shape help to find
touching position. Extension of PSL is done through this touching point to segment the
component into two parts.
Let L be the candidate length of a line. Now we scan each column of the
portion of the line that belongs to the candidate length to check the presence of
black pixels. If a black pixel does not exist in at least 50% of the column of that
line then the line is not a valid line and we delete the lower boundary of this line
to merge this line with its lower line. Thus a mis-segmented line like XY of figure
26 a is corrected. The corrected line segmentation result is shown in figure 26 b.
Compute the
belongingness of the
component
Step 2: Compute piece-wise separating lines (PSL) from each of these stripes as
discussed earlier.
Step 4: Chose the rightmost top potential PSL and extend (from right to left) this
PSL up to the previous stripe.
Step 5: Continue this PSL joining from right to left until we reach the left
boundary of the left-most stripe.
Step 6: Check whether the length of the line drawn equals to the width of the
document. If yes, go to step 7. Else, PSL line extension is done to the right until
we reach the right boundary of the document.
Step 7: Repeat steps 4 to 6 for the potential PSLs not considered for joining so
far. If there is no more PSL for joining, stop.
Let us see all these steps in detail
(a)
Compute the row-wise sum of all black pixels of a stripe. The row where
the sum is zero is the PSL. If there are few consecutive rows where the black
pixels are zero, then first row of such rows is the PSL.
Fig 30
potential
PSL’s of
the text
All the PSL’s may not be useful for line segmentation, so choose
some potential PSLs among these. Compute the normal distances between two
consecutive PSLs in a stripe. So if there are ‘n’ PSLs we get ‘n-1’ distances. This
is done for all stripes. Compute the statistical mode Mpsl of such distances. If the
distance between any two consecutive PSLs of a stripe is less than Mpsl then
remove the upper PSL of these two PSLs. PSLs obtained after this removal are
the potential PSLs.
ii) Check whether line-wise PSL joining is complete or not. If for a line it is
not complete, joining from left to right is done to obtain complete
segmentation.
We say PSLs joining of a line is complete if the length of joined PSLs is
equal to the column size of the document image. This two step approach is done
to get good results even if two consecutive text lines are in overlapping or
connected fashion.
If the extended part intersects black pixel of any component then compute
the belongingness of the component. Compute the distances from the
intersecting point to the topmost and bottommost point of the component .let d1
be the topmost point and d2 be the bottommost point of the component.
If d1<d2 and d1< (Mpsl/2) then the component belongs to lower line. If d2<d1
and d2< (Mpsl/2) then the component belongs to upper line.
Following is the figure obtained after all the steps
3.3 APPLICATIONS
3.3.1 Practical Applications
3.3.2 Banking
The uses of OCR vary across different fields. One widely known
application is in banking, where OCR is used to process checks without human
involvement. A check can be inserted into a machine, the writing on it is scanned
instantly, and the correct amount of money is transferred. This technology has
nearly been perfected for printed checks, and is fairly accurate for handwritten
checks as well, though it occasionally requires manual confirmation. Overall, this
reduces wait times in many banks.
3.3.3 Legal
3.3.4 Healthcare
Healthcare has also seen an increase in the use of OCR technology to
process paperwork. Healthcare professionals always have to deal with large
volumes of forms for each patient, including insurance forms as well as general
health forms. To keep up with all of this information, it is useful to input relevant
data into an electronic database that can be accessed as necessary. Form
processing tools, powered by OCR, are able to extract information from forms
and put it into databases, so that every patient's data is promptly recorded. As a
result, healthcare providers can focus on delivering the best possible service to
every patient.
OCR is widely used in many other fields, including education, finance, and
government agencies. OCR has made countless texts available online, saving
money for students and allowing knowledge to be shared. Invoice imaging
applications are used in many businesses to keep track of financial records and
prevent a backlog of payments from piling up. In government agencies and
independent organizations, OCR simplifies data collection and analysis, among
other processes. As the technology continues to develop, more and more
applications are found for OCR technology, including increased use of
handwriting recognition. Furthermore, other technologies related to OCR, such
as barcode recognition, are used daily in retail and other industries. To learn
more about OCR solutions for your office, you can download a free trial of
Maestro Recognition Server, CVISION's OCR toolkit, or Trapeze, our automated
form-processing solution.
Digital library initiatives are adopting advanced OCR technology like Prime
OCR to convert large book collections for on-line viewing of content. Not only is
Prime OCR designed to generate accurate results but it can also provide a level
of reliability that cannot be found in traditional desktop OCR software.
3.3.9 E-books
On-line retailers use Prime OCR's RTF results to retain text format and
layout to re-create books that can be marketed as e-books. Prime OCR's
character accuracy and retention of format allow clients to efficiently reproduce
machine printed material into electronic media.
Various clients use Prime OCR's high accuracy results to save time and
money in generating on-line content from bound books. Not only does Prime
OCR generate high accuracy character results but it retains excellent formatting
which cuts down on the time to format each page of the book for on-line viewing.
A large shipping company uses Prime Zone to scan bar codes on a signed
billing receipt. Once scanned into the system users can view the signed
receipton-line by searching for the shipping reference number. Customer service
personnel are able to electronically e-mail the scanned signed receipt within
seconds instead of taking days to find a filed hard copy of the receipt.
Another large shipping company OCRs the invoice number from the
scanned invoice and has customized Prime OCR to rename the image file to the
invoice number facilitating document storage and search and retrieval.
Software Requirements
Hardware Requirements
RAM – 512 mb
HDD – 20 gb or Higher
CHAPTER - 4
4.1 RESULTS
VERTICAL STRIPES & PLS’s OF INPUT IMAGE
FILTERED PLS’s
Fig 36 Filtered PSL’s
JOINING OF PSL’s
Fig 37 Joining of PSL’s
CHAPTER - 5
5.1 CONCLUSION
This paper has provided a comprehensive review of the methods for off-
line handwriting text line segmentation previously proposed by researchers. After
a brief description of the characteristics of text line structures in handwritten
documents, we have describes the challenges in text line segmentation. We also
reviewed the different approaches to segment a handwritten document into text
lines and proposed taxonomy. An extensive performance evaluation and
quantitative comparison of experiment results in the previously proposed
methods was performed study is made on different optical character recognition
systems developed for Indian scripts. The technologies of these OCRs are
discussed at length in this paper, which can be used as a starting step for the
researchers entering into this area.
BIBILOGRAPHY
Rodolfo P. dos Santos, Gabriela S. Clemente, Tsang Ing Ren and George D.C.
Cavalcanti Center of Informatics, Federal University of Pernambuco Recife, PE,
Brazil - www.cin.ufpe.br/~viisar {rps2,gsc2,tir,gdcc}@cin.ufpe.br.
Tappert, Charles C., et al (1990-08), The State of the Art in On-line Handwriting
Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol12No8,August1990,pp787-ff,
http://users.erols.com/rwservices/pens/biblio90.html#Tappert90c, retrieved 2008-
10-03
N TRIPATHY and U PAL computer Vision and Pattern Recognition Unit, Indian
Statistical Institute, 203 BT Road,Kolkata 700 108,India e-mail:
umapada@isical.ac.in MS received 19 June 2004; revised 11 May 2006
http://www.di.uoa.grlouloud@mm.di.uoa.gr, halatsis@di.uoa.gr.
M.K. Jindal Department of Computer Applications,
Panjab University Regional Centre,
Muktsar, Punjab, India
manishphd@rediffmail.com,
T. K. Bhowmik IBM Global Services Pvt Ltd, Embassy Golf Link, Bangalore - 560
071, INDIA.
tbhowmik@in.ibm.com