Sunteți pe pagina 1din 26

Data

Warehouse
Agenda
 What is Data Warehouse
 Transaction System vs Data Warehouse
 Data Warehouse Architecture
 Metadata
 Data Flows
 Issues for building Data Warehouse
 Warehouse Schema
 Tool & Technologies
 Advantages of Data Warehouse
 Problems
 Data Mart
 Data Mining
Data Warehouse
What is Data Warehouse?
 Collection of integrated, subject-oriented, time-variant
and non-volatile data in support of managements
decision making process.

 Described as the "single point of truth", the "corporate


memory", the sole historical register of virtually all
transactions that occur in the life of an organization.

Data Warehouse
Transaction System vs. Data
Warehouse
♦ Transaction System ♦ Data Warehouse
Supports day-to-day operational Supports management analysis and
processes decision-making processes
Contains raw, detailed data that has not Contains summarized, refined, and
been refined or cleansed cleansed information
Volatile -- data changes from day-to-day, Non-volatile -- provides a data
with frequent updates “snapshot”; adjustments are not
Technical issues drive the data permitted, or are limited
structure and system design Business analysis requirements drive the
Disparate data structures, physical data structure and system design
locations, query types, etc. Integrated, consistent information on a
Users rely on technical analysts for single technology platform
reporting needs Users have direct, fast access via On-line
Operational processes impacted by Analytical Processing tools
queries run off of system Minimal impact on operational processes

Data Warehouse
Data Warehouse
Architecture
Reporting,
ODS 1 Query query,
application
Meta-data Lightly
High Manager development,
Summarized
summarized and EIS tools
Load data
data
ODS 2
Manager
Detailed data DBMS OLAP tools

ODS 3

Warehouse Manager
Operational data
store (ODS) Data mining

Archive/backup End-user access tools


data

Data Warehouse
 Operational datastore(ODS)
It is a repository of current and integrated operational data
used for analysis.

 Load manager it performs all the operations associated


with the extraction and loading of data into the warehouse.

 Warehouse managerperforms all the operations


associated with the management of the data in the
warehouse.

 Query manageralso called backend component, it


performs all the operations associated with the management
of user queries.

Data Warehouse
 End-user access toolscan be categorized into five main groups:
data reporting and query tools, application development tools,
executive information system (EIS) tools, online analytical
processing (OLAP) tools, and data mining tools

 Summarized data-> Stores all th aggregations generated by


warehouse manager.Exists to speed up performance of queries and
do not require backup

 Archive/backup data-> Backup ensures recovery of Data


Warehouse from any data loss or any failure.
In archiving, older data is removed from the system in a format that
allows it to be qickly restored if required.

 Meta-data

Data Warehouse
Importance of Meta Data
 Meta-data : data about data
 Purpose of meta-data is to show the pathway back to where the
data began, so that the warehouse administrators know the history
of any item in the warehouse
 The meta-data associated with data transformation and loading
must describe the source data and any changes that were made to
the data
 The meta-data associated with data management describes the
data as it is stored in the warehouse
 The meta-data is required by the query manager to generate
appropriate queries, also is associated with the user of queries

Data Warehouse
Data flows

 Inflow- The processes associated with the extraction, cleansing,


and loading of the data from the source systems into the data
warehouse.
 upflow- The process associated with adding value to the data in the
warehouse through summarizing, packaging , packaging, and
distribution of the data
 downflow- The processes associated with archiving and backing-
up of data in the warehouse
 outflow- The process associated with making the data availabe to
the end-users
 Meta-flow- The processes associated with the management of the
meta-data

Data Warehouse
Reporting, query,application
development, and EIS (executive
information system) tools

Operational
Warehouse Manager
data source1
Meta-flow
Meta-data High
summarized data
Inflow Outflow
Lightly
Load summarized
Manager data Query Manager OLAP (online
Upflow analytical processing)
Operational tools
data source n Detailed data DBMS

Operational
data store (ods)
Warehouse Manager

Downflow

Archive/backup Data mining tools


data
End-user access tools
Information flows of a data warehouse

Data Warehouse
Issues to be addressed in
Building Data Warehouse
 When and how to gather Data?
 What schema to use?
 Data Cleansing
 How to propagate updates?
 What data to summarize?

Data Warehouse
Warehouse Schema

 Fact Table:
Stores the business data. Data in fact table is
called Fact. They contain multidimensional data.
 Dimension Table:
To minimize storage requirements, dimension
attributes are usually short identifiers that are
foreign keys into other tables called Dimension
Table

Data Warehouse
Schema with Fact & Dimension
Table
Name of the PRODUCT Area 1
Product
Product Area 2
AREA
Number
Description
Of Product DURATION Area 3

Year
Beginning
Date
Completion
Date

Data Warehouse
Star Schema
 Fact table in the center and all the dimension tables
attached to the central fact table.
Example: Sales Processing
Dimension
Table:
PRODUCT

Dimension Dimension
Table:
Fact Table Table:
AREA TIME
SALES

Dimension
Table:
CUSTOMER
Data Warehouse
Dimension Tables
Region_Dimension_Table

region _id region _doc

NE Northeast
Product_Dimension_Table NW Northwest account _id _id account
account account _doc
_doc
SE Southeast
prod_grp_id prod_id prod_grp_desc prod_desc SW Southwest 100000
100000 ABC
ABCElectronics
Electronics
110000
110000 Midway
Midway Electric
Electric
10 100 Fewer devices Power supply 120000
120000 Victor Components
Victor Components
20 140 Circuit boards Motherboard 130000
130000 Washburn, Inc. Inc.
Washburn,
30 220 Components Co-processor 140000
140000 Zerox
Zerox

Account_Dimension_Table

month
month prod_id
prod_id region_id
region_id account_id
account_id vend_id
vend_id net-sales
net-sales gross_sales
gross_sales

01-1996
01-1996 100
100 SW
SW 100000
100000 100
100 30,000
30,000 50,000
50,000
02-1996
02-1996 140
140 NE
NE 110000
110000 200
200 23,000
23,000 42,000
42,000
03-1996
03-1996 220
220 SW
SW 100000
100000 300
300 32,000
32,000 49,000
49,000

Fact Table

Monthly_Sales_Summary_Table

Vendor_Dimension_Table
month
month mo_in_fiscal_yr
mo_in_fiscal_yr month_name
month_name
vend_id
vend_id vendor_desc
vendor_desc
01-1996
01-1996 4
4 January
January
02-1996
02-1996 5
5 February
February 100 PowerAge, Inc. Inc.
100 PowerAge,
03-1996
03-1996 6
6 March
March 200 Advanced MicroMicro
DevicesDevices
200 Advanced
300
300 Farad Incorporated
Farad Incorporated
Time_Dimension_Table
Data Warehouse
Snowflake Schema
 Consists of Fact Table and Normalized
Dimensional Table.
Disadvantage:
 Unmanageable Data
 Difficult to Retrieve Data
 Metadata become Complex

Data Warehouse
Snowflake Schema
Product Category Product
Manufacturer

Dimension
Table
PRODUCT

Dimension Dimension
Table
Fact Table Table
AREA TIME
SALES

Dimension
Table
CUSTOMER
Data Warehouse
Starflake Schema
 Combination of Star Schema and Snowflake
Schema.
 Consists of Fact table, Star Dimension and
Snowflake Dimension.

Data Warehouse
Starflak
e Price Weight

Schema Product
Snowflake
Dimension

Star Dimension Fact Table Star dimension


Product SALES Location

Location

Location 1 Location 2

Data Warehouse
Tools and Technologies

Tools & Technologies used in the


construction of a Data Warehouse:

 Data Extraction - SAS


 Data Cleansing - Apertus, Trillium
 Data Storage - ORACLE, SYBASE

Data Warehouse
Advantages of using data
warehouse
 End-user access wide variety of data
 Business decision making for future purpose
 Increases data consistency
 Increases productivity
 Decreases computing costs
 Combines data

Data Warehouse
Problems
 Increased end-user demands
 High demand for resources
 High maintenance
 Extracting, cleansing and loading data could be time
consuming.
 Data warehousing increases project scope.
 Problems with compatibility with systems already in place
e.g. transaction processing system.
 Providing training to end-users, who end up not using the
data warehouse.
 Security could develop into a serious issue, especially if
the data warehouse is web accessible.

Data Warehouse
Data mart
 It a subset of a data warehouse that supports
the requirements of particular department or
business function
 The characteristics that differentiate Data Marts
and Data Warehouses include:
 A Data mart focuses on only the requirements of users
associated with one department or business function
 Data marts do not normally contain detailed
operational data, unlike data warehouses
 As data marts contain less data compared with data
warehouses, data marts are more easily understood
and navigated
Data Warehouse
Operational Warehouse Manager
data source1

Reporting, query,application
Highly
Meta-data summarized data
development, and EIS tools

ODS 1 Lightly
Load Query
summarized
Manager Manager
data

ODS 2 Detailed data


DBMS
OLAP tools
ODS 3
Warehouse Manager

(First Tier) Data mining


Operational data store (ODS)

Archive/backup End-user
data access tools

summarized
Data Mart Data
(Relational database)

(Second Tier)
Summarized data
(Multi-dimension
database)
Data Warehouse
Reasons for creating a
Data

Mart
To give users access to the data they need to analyze
most often
 To provide data in a form that matches the collective
view of the data by a group of users in a department or
business function
 To improve end-user response time due to the reduction
in the volume of data to be accessed
 To provide appropriately structured data the user as it is
the requirements of end-user access tools
 Normally use less data so tasks such as data cleansing,
loading, transformation, and integration are far easier,
and hence implementing and setting up a data mart is
simpler than establishing a corporate data warehouse
Data Warehouse
Data Mining
 Process of extracting previously unknown, valid and actionable
information from large data and then using the information to make
crucial business decisions.

 Applications : Early warning systems, Fraud detection, market


research, direct mail.

 Data Mining provides techniques to :


 Detect trends or patterns, find correlations

 Data Analysis


Forecasting and business modeling

Data Warehouse

S-ar putea să vă placă și