Sunteți pe pagina 1din 3

Document Management System for BSNL

Abstract
Introduction
Knowledge intensive organizations have vast array of information contained in large
document repositories. With the advent of E-commerce and corporate
intranets/extranets, these repositories are expected to grow at a fast pace. This
explosive growth has led to huge, fragmented, and unstructured document
collections. Although it has become easier to collect and store information in
document collections, it has become increasingly difficult to retrieve relevant
information from these large document collections.
There are three important paradigms of research in the area of information retrieval
(IR): Probabilistic IR, Knowledge-based IR, and, Artificial Intelligence based
techniques like neural networks and symbolic learning. Very few researchers have
tried to use evolutionary algorithms like genetic algorithms (GA's). Previous
attempts at using GA's have concentrated on modifying document representations
or modifying query representations. This work looks at the possibility of applying
GA's to adapt various matching functions. It is hoped that such an adaptation of the
matching functions will lead to a better retrieval performance than that obtained by
using a single matching function. An overall matching function is treated as a
weighted combination of scores produced by individual matching functions. This
overall score is used to rank and retrieve documents. Weights associated with
individual functions are searched using Genetic Algorithm.
Project
The project speaks about maintaining and retrieving the documents with the help of
automated software.
Earlier the system maintained manually with a little
assistance from software. The software was used to store SIM card details against
the Customer Application Form No. (CAF No). Other activities were carried out
manually, including storing, retrieval or validating the documents.
Optimisation
With the Automation software, all the documents has been converted into soft copy
and uploaded along with the key details. After authorization of those documents,
the hard copy and the soft copy are tagged with a lot number. Based on the lot
number search, it is easy to retrieve those documents for further verification.
Genetic Algorithm
The GENETIC ALGORITHM is a model of machine learning which derives its behavior
from a metaphor of the processes of Evolution in nature. This is done by the
creation within a machine of a Population of Individuals represented by

Chromosomes, in essence a set of character strings that are analogous to the base4 chromosomes that we see in our own DNA. The individuals in the population then
go through a process of evolution.
As it turns out, there are mathematical proofs that indicate that the process of
Fitness proportionate Reproduction is, in fact, near optimal in some senses.
GENETIC ALGORITHMs are used for a number of different application areas. An
example of this would be multidimensional Optimization problems in which the
character string of the Chromosome can be used to encode the values for the
different parameters being optimized.
In practice, therefore, we can implement this genetic model of computation by
having arrays of bits or characters to represent the Chromosomes. Simple bit
manipulation operations allow the implementation of Crossover, Mutation and other
operations.
An iteration of this loop is referred to as a Generation. There is no theoretical
reason for this as an implementation model. Indeed, we do not see this punctuated
behavior in Populations in nature as a whole, but it is a convenient implementation
model.
The first Generation (generation 0) of this process operates on a Population of
randomly generated Individuals. From there on, the genetic operations, in concert
with the Fitness measure, operate to improve the population.
PSEUDO CODE
Algorithm GA is
// start with an initial time
t := 0;
// initialize a usually random population of individuals
initpopulation P (t);
// evaluate fitness of all initial individuals of population
evaluate P (t);
// test for termination criterion (time, fitness, etc.)
while not done do
// increase the time counter
t := t + 1;
// select a sub-population for offspring production
P' := selectparents P (t);
// recombine the "genes" of selected parents
recombine P' (t);

// perturb the mated population stochastically


mutate P' (t);
// evaluate it's new fitness
evaluate P' (t);

Implementation
The project has been implemented in one of the head quarters of Indias leading
telecom service provider, BSNL. The branch has about 10 lakh documents and the
volume increases with a rate of about 10000 documents per year.
The
implementation took about 6 months and consuming 4 man power on full time
basis.
Results
Earlier the retrieval and verification of about 250 documents per month takes about
15 man days and today it can be completed within 5 hours.

S-ar putea să vă placă și