Sunteți pe pagina 1din 2

 Ignore the tuple: This is done when the class label is not found.

This method is not very


productive unless the tuple contains several allocations with missing values
 Fill in the missing values manually: This technique is effective on limited data set with
some missing values
 Replace missing distinct values with global constants
 Replace missing values with the attribute mean or predictable values

The sources may involve multiple databases, data cubes, or flat files. One of the most customary
implementations of data integration is building an enterprise data warehouse.
TIGHT: In this approach data from different sources are integrated into a single physical location
by the process of ETL – Extraction, Transformation, and Loading.
Loose: data remains in the original source databases. A combination which provides scope to
take queries from the user and transforms them in a format the source database can understand
and then sends the query directly to the source databases to obtain the result.

It is usually done from the composition of a source system into the required composition of a
new destination system. The process fundamentally involves converting documents, but data
conversions sometimes involve the transformation of a program from one computer language to
another to authorise the program to run on a different platform. The purpose of this data passage
is the adoption of a new system that’s totally different from the previous one.

 Smoothing: The noise is removed from the data


 Aggregation: Summary or aggregate values are applied to the data
 Generalisation: Low-level data is replaced with high-level data using a notion
known as hierarchies climbing.
 Normalisation: Attributes are scaled to make sure that they come under a small
specified range, such as 0.0 to 1.0
 Attribute Construction: New attributes are created from the given set of attributes.

Data discretization is defined as a process of converting continuous data attribute values into a
finite set of intervals and associating with each interval some specific data value.

Top-down discretisation: In the top-down discretisation process, one or a few points found first
and are used (called split points or cut points) to split the entire attribute range and then repeats
this loop on the resulting intervals.
Bottom-up discretisation: In the bottom-up discretisation, the process starts by acknowledging all
of the continuous values as possible split-points, removes some by merging neighbourhood
values to form intervals.

In a multidimensional model, data is systematically arranged into multiple dimensions, and each
dimension has multiple levels of abstraction defined by concept hierarchies. This provides users
with the adaptability to observe data from different perspectives.

 Binning: It is a top-down unsupervised discretization splitting technique based on a


specified number of bins.

 Histogram Analysis: it is an unsupervised discretization technique which separates the


values for an attribute into disjoint ranges called buckets.
 Cluster Analysis: It is a well-known data discretization method which uses an algorithm
to separate a numerical attribute of data set by partitioning the values of data set into
clusters or groups.

S-ar putea să vă placă și