Sunteți pe pagina 1din 44

Data Warehousing

Online Analytical Processing


(OLAP)

1
Purpose of Online Analytical Processing
(OLAP)
• Data warehouses bring together large volumes of data for the
purposes of data analysis.
• Accompanying the growth in data warehousing is an ever-
increasing demand by users for more powerful access tools that
provide advanced analytical capabilities
– There are two main types of access tools available to meet this
demand, namely online analytical processing (OLAP) and data
mining.
• These tools differ in what they offer the user and because of this
they are complementary technologies
• Data warehouse (or more commonly one or more data marts)
together with tools such as OLAP and/or data mining are
collectively referred to as Business Intelligence (BI) technologies.

2
The Complete Decision Support System

? ? ? ?
(Tier 1) (Tier 2) (Tier 3)

?
?
?
?
?

?
? ?

? e.g., ROLAP

?
?

?
The Complete Decision Support System

Information Sources Data Warehouse OLAP Servers Clients


Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Semistructured Analysis
Sources
Data
Warehouse serve

extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve

Data Marts
CS 336 4
• A data warehouse stores operational data and is expected
to support a wide range of queries from the relatively
simple to the highly complex.
• The ability to answer particular queries is dependent on
the types of end-user access tools
• General-purpose tools such as reporting and query tools
can easily support ‘who?’ And ‘what?’ Questions about
past events.
• A typical query submitted directly to a data warehouse is:
what was the total revenue for scotland in the third
quarter of 2004?’. ‘
• For this we focus on a tool that can support more
advanced queries, namely online analytical processing
(OLAP).

5
OLAP
• OLAP is a term that describes a technology that
uses a multi-dimensional view of aggregate data
for the purposes of advanced analysis
• OLAP enables decision-making about future
actions
• A typical OLAP calculation can be more complex
than simply aggregating data
– for example, ‘Compare the numbers of properties sold
for each type of property in the different regions of
Great Britain for each year since 2000.

6
OLAP
• A data warehouse implementation without an OLAP
tool is nearly unthinkable.
• Data warehousing and on-line analytical processing
(OLAP) are essential elements of decision support
• OLAP database servers use multi-dimensional
structures to store data and relationships between
data.
• Multi-dimensional structures are best visualized as
cubes of data, and cubes within cubes of data.
– Each side of a cube is a dimension

7
OLAP Applications
Functional area Examples of OLAP applications

Finance Budgeting, activity-based costing, financial


performance analysis

Sales Sales analysis and sales forecasting

Marketing Market research analysis, sales forecasting,


promotions analysis, customer analysis

Manufacturing Production planning and defect analysis

8
Key Features of OLAP
• Described in the OLAP Council White Paper
(2001):
1. Multi-dimensional views of data;
2. Support for complex calculations;
3. Time intelligence.

9
OLAP Tools
• OLAP tools are categorized according to the
architecture used to store and process
multidimensional data
• There are three main categories of OLAP
tools as defined by Berson and Smith (1997)
including
– Relational OLAP (ROLAP)
– Multidimensional OLAP (MOLAP)
– Hybrid OLAP (HOLAP)
10
Relational OLAP
• ROLAP servers are placed between relational back-
end server and client front-end tools.
• To store and manage warehouse data, ROLAP uses
relational or extended-relational DBMS.
• This facilitates the creation of multiple multi-
dimensional views of the two-dimensional relation.
• ROLAP includes the following:
– Implementation of aggregation navigation logic.
– Optimization for each DBMS back-end.
– Additional tools and services.

11
Architecture for ROLAP tools

12
Multidimensional OLAP
• MOLAP tools use specialized data structures
and multi-dimensional database management
systems (MDDBMSs) to organize, navigate,
and analyze data
• To enhance query performance the data is
typically aggregated and stored according to
predicted usage.
• MOLAP data structures use array technology
and efficient storage techniques
13
Architecture for MOLAP tools.

14
Multi-Dimensional Data

Three-field table
Two-dimensional matrix
Four-field Table Three dimensional Cube.

16
Four-field Table Three dimensional Cube.

17
The MOLAP Cube

Fact table view:


Multi-dimensional cube:

sale prodId storeId amt


p1 s1 12 s1 s2 s3
p2 s1 11 p1 12 50
p1 s3 50 p2 11 8
p2 s2 8

dimensions = 2

CS 336 18
3-D Cube
Fact table view: Multi-dimensional cube:

sale prodId storeId date amt


p1 s1 1 12
p2 s1 1 11 s1 s2 s3
day 2 p1 44 4
p1 s3 1 50
p2 s2 1 8 p2 s1 s2 s3
p1 s1 2 44 day 1 p1 12 50
p1 s2 2 4 p2 11 8

dimensions = 3

CS 336 19
Example
roll-up to region
Dimensions:
NY
ore SF
Time, Product, Store
St roll-up to brand
Attributes:
LA
Product (upc, price, …)
Juice 10
Store …
Product

Milk 34
56

Coke
32
Hierarchies:
Cream
12 Product  Brand  …
Soap
Bread 56 roll-up to week Day  Week  Quarter
M T W Th F S S Store  Region  Country
Time
56 units of bread sold in LA on M

CS 336 20
Hybrid OLAP
• Hybrid OLAP is a combination of both ROLAP and
MOLAP.
• Hybrid OLAP (HOLAP) tools provide limited analysis
capability, either directly against RDBMS products, or
by using an intermediate MOLAP server
• It offers higher scalability of ROLAP and faster
computation of MOLAP.
• HOLAP servers allow to store large data volumes of
detailed information.
• The aggregations are stored separately in MOLAP store

21
Architecture for HOLAP Tools

22
MOLAP vs ROLAP
MOLAP ROLAP

Information retrieval is fast. Information retrieval is comparatively


slow.

Uses sparse array to store datasets. Uses relational table.

MOLAP is best suited for ROLAP is best suited for experienced


inexperienced users, since it is very users.
easy to use.

Maintains a separate database for data It may not require space other than
cubes. available in the data warehouse.

DBMS facility is weak. DBMS facility is strong.

23
OLAP Operations
• Since OLAP servers are based on
multidimensional view of data
• Here is the list of OLAP operations:
– Roll-up
– Drill-down
– Slice and dice
– Pivot (rotate)

24
Roll-up
• Roll-up performs aggregation on a data cube in
any of the following ways:
– By climbing up a concept hierarchy for a dimension
– By dimension reduction
• Roll-up is performed by climbing up a concept
hierarchy for the dimension location.
• When roll-up is performed, one or more
dimensions from the data cube are removed.

25
 Initially the concept hierarchy was "street < city < province < country".
 On rolling up, the data is aggregated by ascending the location hierarchy
from the level of city to the level of country.
 The data is grouped into cities rather than countries 26
Drill-down
• Drill-down is the reverse operation of roll-up.
It is performed by either of the following ways:
– By stepping down a concept hierarchy for a
dimension
– By introducing a new dimension
• Drill-down is performed by stepping down a
concept hierarchy for the dimension time.
• It navigates the data from less detailed data to
highly detailed data.
27
•Initially the concept hierarchy was "day < month < quarter < year."
•On drilling down, the time dimension is descended from the level of quarter to the
level of month.
•When drill-down is performed, one or more dimensions from the data cube are
added.
28
Aggregates
· Add up amounts for day 1
· In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1

sale prodId storeId date amt


p1 s1 1 12
p2 s1 1 11
p1 s3 1 50 81
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4

CS 336 29
Aggregates
· Add up amounts by day
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date

sale prodId storeId date amt


p1 s1 1 12
p2 s1 1 11 ans date sum
p1 s3 1 50 1 81
p2 s2 1 8 2 48
p1 s1 2 44
p1 s2 2 4

CS 336 30
Another Example
· Add up amounts by day, product
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 s1 1 12 sale prodId date amt
p2 s1 1 11 p1 1 62
p1 s3 1 50 p2 1 19
p2 s2 1 8
p1 s1 2 44 p1 2 48
p1 s2 2 4

rollup

drill-down

CS 336 31
Cube Aggregation: Roll-up
Example: computing sums
s1 s2 s3
day 2 p1 44 4
...
p2 s1 s2 s3
day 1 p1 12 50
p2 11 8

s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129

sum
rollup
p1 110
p2 19
drill-down

CS 336 32
Cube Operators for Roll-up
s1 s2 s3
day 2 p1 44 4
...
p2 s1 s2 s3
day 1 p1 12 50
p2 11 8 sale(s1,*,*)

s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129

sum
sale(s2,p2,*) p1 110
p2 19 sale(*,*,*)

CS 336 33
Extended Cube

* s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
day 2
*
s1 67
s2 12
s3 *50 129
p1 44 4 48
p2
s1 s2 s3 *
day 1 * 44 4 48 sale(*,p2,*)
p1 12 50 62
p2 11 8 19
* 23 8 50 81

CS 336 34
Aggregation Using Hierarchies

s1 s2 s3 store
day 2 p1 44 4
p2 s1 s2 s3
day 1 p1 12 50 region
p2 11 8

country

region A region B
p1 56 54
p2 11 8
(store s1 in Region A;
stores s2, s3 in Region B)

CS 336 35
Slice
• The slice operation selects one particular dimension
from a given cube and provides a new sub-cube.

36
• Here Slice is performed for the dimension
"time" using the criterion time = "Q1".
• It will form a new sub-cube by selecting one
or more dimensions.

37
Slicing
s1 s2 s3
day 2 p1 44 4
p2 s1 s2 s3
day 1 p1 12 50
p2 11 8

TIME = day 1

s1 s2 s3
p1 12 50
p2 11 8

CS 336 38
Dice
• Dice selects two or more dimensions from a given
cube and provides a new sub-cube.

39
• dice operation on the cube based on the
following selection criteria involves three
dimensions.
– (location = "Toronto" or "Vancouver")
– (time = "Q1" or "Q2")
– (item =" Mobile" or "Modem")

40
Pivot

• The pivot operation is also known as


rotation.
• It rotates the data axes in view in order to
provide an alternative presentation of data.

41
Consider the following diagram that shows the pivot operation

42
Sales
Slicing & ($ millions)
Products Time
Pivoting Store s1 Electronics
d1
$5.2
d2

Toys $1.9
Clothing $2.3
Cosmetics $1.1
Store s2 Electronics $8.9
Toys $0.75
Clothing $4.6
Cosmetics $1.5

Sales
($ millions)
Products d1
Store s1 Store s2
Store s1 Electronics $5.2 $8.9
Toys $1.9 $0.75
Clothing $2.3 $4.6
Cosmetics $1.1 $1.5
Store s2 Electronics
Toys
Clothing
CS 336 43
Summary of Operations
• Aggregation (roll-up)
– aggregate (summarize) data to the next higher
dimension element
– e.g., total sales by city, year  total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a sub-cube
– e.g., sales where city =‘Vancouver’ and date = ‘1/15/90’
• Visualization operations (e.g., Pivot)

CS 336 44

S-ar putea să vă placă și