Sunteți pe pagina 1din 22

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition

Copyright © 2004 Pearson Education, Inc.


Chapter 28

Overview of Data
Warehousing and OLAP
28.2 Characteristics of Data
Warehouses

A Data warehouse:
 stores integrated data from multiple sources
 supports time-series analysis
 supports trend analysis
 is non-volatile
 stores large amounts of data
 is designed for read-access
Slide 28-3
28.2 Characteristics of Data
Warehouses
A Data warehouse:
 is subject-oriented (such as customers,
products, sales) rather than application
oriented (such as billing, inventory,
shipping)
 is organized to run queries, produce reports,
and perform analysis, rather than
transactional support

Slide 28-4
28.2 Characteristics of Data Warehouses
DBMS for transaction Data warehouse
processing

holds current data holds historical data


stores detailed data stores detailed and summary data
data is dynamic data is largely static
repetitive processing ad-hoc, unstructured, heuristic
transaction driven analysis driven
application oriented subject oriented
predictable pattern of use unpredictable pattern of use

Slide 28-5
Examples of Types of Queries

Data warehouse for Roll-On trucking company

What is the total revenue for the last 10 years by geographic


region?

What is the monthly revenue for each type of shipped product


(gasoline, explosives, nuclear waste, WMD, milk) for 2003?

What would be effect on profits if the gasoline tax were raised


2% in the eastern states?

What is the relationship between the annual revenue of a branch


office and the number of sales staff assigned to the office?

Slide 28-6
Figure 28.1Example transactions in market-
basket model.

Slide 28-7
28.3 Data Modeling for Data
Warehouses

 Data warehouses store data in


multidimensional matrices called data cubes

Slide 28-8
Figure 28.2 Two-dimensional matrix model.

Recorded, for example, are sales (amounts) for a particular


time period.
Slide 28-9
Figure 28.3 A three-dimensional data cube
model.

Slide 28-10
Figure 28.4 Pivoted version of the data cube
from Figure 26.3.

Slide 28-11
Figure 28.5 The roll-up operation.

Roll-up groups data into larger units - here a coarser grain of


product categories.

Slide 28-12
Figure 28.6 The drill-down operation.

Slide 28-13
Figure 28.7 A star schema with fact and
dimensional tables.

Slide 28-14
Figure 28.8 A snowflake schema.

Normalized dimension tables


Slide 28-15
Figure 28.9 A fact constellation.

... a set of fact tables that share some dimension tables.

Slide 28-16
Indexing Data Warehouse Data

Index for speed, and in domains with low cardinality,


bitmap indexing is used.

Consider 100,000 cars and only four types of cars.

Construct 4 bit vectors of 100,000 bits each.

EconomyCar 00100000000011...100000000000000
CompactCar 10010000010000...010111100101010
MidsizeCar 00001001010000...001000011000000
FullsizeCar 01000110101100...000000000010101

is car 999,999 an economy car?


Slide 28-17
28.4 Building a Data Warehouse

 Data is collected from multiple,


heterogeneous sources
 The data must be formatted for consistency
 The data must be "cleansed"
 The data may need to be converted from
relational tuples (and other formats) to the
multidimensional model
 The data is then loaded into the warehouse

Slide 28-18
28.5 Typical Functionality of a Data
Warehouse

 Roll-up
 Drill-down
 Pivot (cross-tabulation)
 Slice and dice
 Sorting
 Selection
 Using derived attributes
Slide 28-19
28.6 Data Warehouse Versus Views

 A data warehouse:
– provides persistent storage
– is not usually relational (is multidimensional)
– can be indexed
– provides specific support of functionality
– provide large amounts of integrated data

Slide 28-20
28.7 Problems and Open Issues in
Data Warehouses

 Difficulties of Implementing Data


Warehouses
– expensive
– data quality control
– adaptability to new usage and data

Slide 28-21
28.7 Problems and Open Issues in
Data Warehouses

 Open Issues in Data Warehousing


– data cleaning
– indexing
– data partitioning
– automation of acquisition, quality management,
performance optimization
– incorporation of business rules
– Efficient update

Slide 28-22

S-ar putea să vă placă și