Sunteți pe pagina 1din 15

DATA WAREHOUSING

INTRODUCTION
Data Warehouse Ralph Kimball

Def: -

DECISION SUPPORTING SYSTEM (DSS): Since a DWH is a decision to support a decision making process,
hence it is known as Decision Supporting System.

HISTORICAL DATABASE: -

DATA WAREHOUSE

A Data Warehouse is a relational Database, which is specifically


designed for analyzing the business but not for business transactional
processing.

Since a DWH maintains historical business transactions for analyzing


the historical trends of the business hence it is known as Historical
Database.

INTEGRATED DATABASE: A DWH is an integration of data from multiple OLTP databases.

READ ONLY DATABASE: Since data base is design to only query the data for analyzing, but
not for transactional processing hence it is called as Read Only Database.

DATA ACQUISITION:Its a process of extracting the data from multiple OLTP source
systems, integrating the data into a homogeneous format and loading into
Data Warehouse.
There are two types of ETLs to build data acquisition
i)
ii)

Code Based ETL


GUI Based ETL

CODE BASE ETL: An ETL application can be developed using programming languages
such as SQL, PL/SQL.
Ex: - SAS based SAL access, Teradata and ETL utilities.

An ETL application can be developed with the simple graphical user


interfacing (GUI), point circle techniques.
Ex: - Informatica, Data Stage, AB Initio.

DATA EXTRACTION: Its a process of reading the data from multiple OLTP source
systems. The following are the different source systems.
I.

Main Frames

II.

Oracle Applications

III.

SAP

IV.

People Soft

V.

XML Files

VI.

Flat Files

DATA WAREHOUSE

GUI BASE ETL: -

DATA TRANSFORMATON: Its a process of converting the data into the required business
format.

DATA CLEANSING: It is a process of filtering unwanted data.


Ex: i) Remove the records which contain nulls.
ii) Eliminates duplicates.

DATA SCRUBBING: It is the process of deriving new data which is not available in the
source.

DATA MERGING: Its a process of integrating the data records from multiple sources.
I.
II.

Vertical Merging

DATA WAREHOUSE

Ex: -

Horizontal Merging

VERTICAL MERGING: Its a process of integrating records the data from similar source
definitions.

HORIZONAL MERGING: Its a process of integrating the data records horizontally using
the process called JOIN (Based on common column values).

DATA AGGREGATION: -

DATA WAREHOUSE

Its a process of calculating the business summarizing from details.

DATA LOADING: Its a process of inserting the data into target system. There are two
types of loads
I.
II.

Initial Load
Incremental Load

INITIAL LOADING: Its a process of inserting the data into an empty target table.
INCREMANTAL LOAD: Its a process of loading only new records after initial load.

ETL CLIENT SERVER TECHONOLOGY: ETL CLIENT: An ETL client is graphical application software which allows to
design the plan of ETL process.
An ETL plan is design with following components

i) Source definition
ii) Target definition

METADATA: It defines the data and process.


A Metadata defines structure of data.

ETL REPOISTORY: -

DATA WAREHOUSE

iii) Transformation Rule (Business logic)

The repository is the brain of ETL system which stores Metadata


which is required to perform ETL process.

ETL SERVER: An ETL server is an engine that performs extraction, transformation


and loading.

OPEN DATABASE CONNECTIVITY (ODBC)


An ODBC is an interface that provides an access to various
databases.

DATA WAREHOUSE

DIFFERENCES BETWEEN OLTP & DWH: OLPT


i) An OLPT is design to perform
business transactional
processing
ii) Volatile data
iii) Current data
iv) Detail data
v) Designed for running the
business

DWH
i)

A DWH is design to support


decision making process

ii)

Non volatile data

iii)

Historical data

iv)

Summarized data

v)

Designed for analyzing the


business

vi)

Designed for managerial


access

vii)

Demoralized data

vi) Designed for clerical access


vii)

Normalized data

DATA WAREHOUSE Data Modeling


The process of designing data base is known as Data modeling.

A database architect (or) data modeler creates the database designs


using a GUI based database designing tool called ERWIM (a product of
Computer Associates).
A DWH is designed with following types of schemas
i) STAR Schema
ii) Snow Flake Schema
iii) Galaxy Schema (Hybrid Schema, Constellation Schema,

STAR SCHEMA: A Star Schema is a database design which contains a centrally


located FACT table, which is surrounded by multiple dimension tables.
In Data Warehouse FACTS are numerics, but every numeric is not a
FACT. But numerics which are of type Key Performance Indicators are
known as FACTS.

DATA WAREHOUSE

Bus Schema)

A FACT table contains FACTS. FACTS are business measures.


FACTS are used to evaluate (or) analyze the enterprise business
performance.
A dimension is a descriptive data which analysiss the key
performance indicator known as FACTS.
The dimensions are organized in a dimension table.
Since the database looks like a STAR hence it is known as STAR
Schema Database design.
NOTE: - LOAD FREQUENCY: A. Daily Load
B. Weekly Load
LOAD ORDER: A. First load into dimension tables.
B. If all dimension loads are success then load the data into
FACT table.
LOAD TYPES: -

A. Initial Load

DATA WAREHOUSE

B. Incremental Load

SNOW FLAKE SCHEMA: A very large dimension table is splited into one (or) more dimension
tables, which results in reducing quite bit of table space.
It improves the query performance.
Disadvantage as number of tables increases the number of joints
increases as a result query performance many also degrade.

In a Snow Flake Schema a dimension table may have parents where


as in a Star Schema no single dimension table has parents.

DATA WAREHOUSE

Note: -***

GALAXY SCHEMA: FACT CONSTELLATION: It is a process joining two FACT tables from multiple schemas.

CONFORMED DIMENSIONS: A dimension table which is shared by multiple FACT tables is known
as Conformed Dimensions.
Ex: - Customer and Time.

ZUNK DIMENSION: A dimension with the type flag (0 or 1) and bullion (YES or NO) are
not used to describe the Key Performance Indicators are known as Zunk
Dimensions.
Ex: - Gender_Flag, Product_Promotion_Flag.

DIRTY DIMENSION: In a dimension table if the record exist more than once with a
change of non-key attribute is known as Dirty Dimension.

Here D1 & D2 are Conformed Dimensions.

SLOWLY CHANGING DIMENSIONS: - (Practically Tough


Topic)

DATA WAREHOUSE

Ex: - Type2 Dimension

A dimension which can be changed during the period of time is


known as Slowly Changing Dimensions.
There are three types of dimensions
I.

TYPE1 DIMENSION

II.

TYPE2 DIMENSION

III.

TYPE3 DIMENSION

TYPE1 DIMENSION: A type1 dimension stores only current changes in the target. It does
not store history.

TYPE2 DIMENSION: A type2 dimension stores complete historical data in the target.
For each update in the OLTP it inserts a new record in the target. A
surrogate key is a system generated sequence number that is to be
defined as Primary Key.

TYPE3 DIMENSION: 10

A type3 dimension stores just current and previous data in the


target (Partial History).

TYPES OF FACT TABLES: Detailed Fact Table: A FACT table which contains details of the transactions is known as
Detailed Fact Table.

DATA WAREHOUSE

Example for Slowly Changing Dimension

Summarized Fact Table: A FACT table which contains aggregate facts is known as summary
FACT table.

Fact-Less Fact Table: A FACT table without any FACTS is known as FACT-Less FACT Table.

TYPES OF FACTS: There are three types of FACTS, a fact table can have

Additive FACTS: A FACT which can be summarized for all the dimensions is known as
Additive FACT.
Ex: - Quantity, Revenue

11

DATA WAREHOUSE

Semi-Additive FACTS: A FACT which can be summarized for a few dimensions but not for
all the dimensions is known as Semi-Additive FACT.
Ex: - Current Balance

12

A FACT which cannot be summarized for any of the dimensions is


known as Non-Additive FACT.
Ex: - Discount, Percentage

DATA WAREHOUSE

Non-Additive FACTS: -

ONLINE ANALYTICAL PROCESSOR (OLAP): An OLAP is a set of specifications which allows the decision maker
(or) end-users to query the data from database and present the data for
analysis in a template called Report.

13

The following are the types of OLAP

Relational OLAP (R-OLAP): An OLAP which can query the data from relational data sources is
known R-OLAP.
Ex: - Cognos, Business Objects, Micro Strategy.

Multi-Dimensional OLAP (M-OLAP): -

Ex: - Cognos, Business Objects, Micro Strategy.

Hybrid OLAP (H-OLAP): An OLAP which supports the combined properties of R-OLAP and MOLAP is known as H-OLAP.
Ex: - Cognos, Business Objects, Micro Strategy.

DATA WAREHOUSE

An OLAP which can query the data from multi-dimensional database


(CUBE) is known as M-OLAP.

Desktop OLAP (D-OLAP): An OLAP which can query the data from desktop databases such as
Text Files, XML Files, EXCEL known as D-OLAP.
Ex: - Cognos, Business Objects, Micro Strategy. Category

14

15

DATA WAREHOUSE

S-ar putea să vă placă și