Sunteți pe pagina 1din 2

Legacy Data

June 3, 2009 Editorial Team+Data TypesNo comments


Legacy data comes from virtually everywhere within the information system and su
pport legacy systems. The many sources of legacy data include databases, often r
elational but hierarchical, network, object, XML, and object/relational database
s as well. Legacy data is another term used for disparate data.
Some files such as XML documents or flat files such as configuration files and com
ma-delimited text files may also be sources of legacy data. But the biggest sour
ces of legacy data are those from the old, updated and antiquated legacy systems
.
A legacy system refers to an existing group of computers or application programs
which have been old and outdated by companies still refuse to give them up beca
use they still serve well.
These systems are usually large and companies have invested so much money in imp
lementing legacy systems in the past that despite some potentially problematic i
dentified by IT professionals, many still want to keep them for several reasons.
One of the main problems with legacy systems is that they often run on very slow
and obsolete hardware parts which, when broken, would be very difficult to look
for replacements. Because of the general lack of understanding of these old tec
hnologies, they are often very hard to maintain, improve and expand. And because
they are old and obsolete, chances the operations manual and other documentatio
ns may have been lost through the years.
Despite the emergences of newer technologies with individual parts relatively ch
eaper, many companies still have compelling reasons why they are keeping such ol
d and antiquated system whose data adds to the disparity in data warehouse syste
ms.
One of the biggest reasons is the legacy systems were implemented to be large an
d monolithic in nature and coming up with a one time redesign and reimplementati
on would be very costly and complicated. If legacy systems are taken out at one
single moment, the whole business process would be halted for sometime because o
f the monolithic and centralized nature of these systems.
Most companies cannot afford any business stoppage especially in todays very fast
paced data driven business environment. What worsens the situation even more is
that legacy systems are not very understood by younger IT professional so redes
igning them to adopt to newer technologies would take so long and intensive plan
ning.
That is why it is very common to see data warehouses nowadays which are a combin
ation of new and legacy systems. The effect would be having legacy data which ar
e very incompatible with the data coming from the data sources using newer techn
ologies.
In fact, different new technology vendors are encountering differing disparity d
ata related problems with using legacy systems. IBM alone has enumerated some ty
pical legacy data problems which include among others:
Incorrect data values
Inconsistent/incorrect data formatting
Missing data

Missing columns
Additional columns
Multiple sources for the same data
A single column being used for several purposes
The purpose of a column is determined by the value of one or more other columns
Important entities, attributes, and relationships hidden and floating in text fi
elds
Data values that stray from their field descriptions and business rules
Various key strategies for the same type of entity
Unrealized relationships between data records
One attribute is stored in several fields
Inconsistent use of special characters
Different data types for similar columns
Different levels of detail
Different modes of operation
Varying timeliness of data
Varying default values and other Various representations.
Legacy data and the problem regarding data disparity they bring to a data wareho
use can be solved by the process of ETL (extract, transform, load). This is a me
chanism of converting disparate data not just from legacy systems but all other
disparate data sources as well before they are loaded into the data warehouse.

S-ar putea să vă placă și