Documente Academic
Documente Profesional
Documente Cultură
1
Purpose of Online Analytical Processing
(OLAP)
• Data warehouses bring together large volumes of data for the
purposes of data analysis.
• Accompanying the growth in data warehousing is an ever-
increasing demand by users for more powerful access tools that
provide advanced analytical capabilities
– There are two main types of access tools available to meet this
demand, namely online analytical processing (OLAP) and data
mining.
• These tools differ in what they offer the user and because of this
they are complementary technologies
• Data warehouse (or more commonly one or more data marts)
together with tools such as OLAP and/or data mining are
collectively referred to as Business Intelligence (BI) technologies.
2
The Complete Decision Support System
? ? ? ?
(Tier 1) (Tier 2) (Tier 3)
?
?
?
?
?
?
? ?
? e.g., ROLAP
?
?
?
The Complete Decision Support System
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve
Data Marts
CS 336 4
• A data warehouse stores operational data and is expected
to support a wide range of queries from the relatively
simple to the highly complex.
• The ability to answer particular queries is dependent on
the types of end-user access tools
• General-purpose tools such as reporting and query tools
can easily support ‘who?’ And ‘what?’ Questions about
past events.
• A typical query submitted directly to a data warehouse is:
what was the total revenue for scotland in the third
quarter of 2004?’. ‘
• For this we focus on a tool that can support more
advanced queries, namely online analytical processing
(OLAP).
5
OLAP
• OLAP is a term that describes a technology that
uses a multi-dimensional view of aggregate data
for the purposes of advanced analysis
• OLAP enables decision-making about future
actions
• A typical OLAP calculation can be more complex
than simply aggregating data
– for example, ‘Compare the numbers of properties sold
for each type of property in the different regions of
Great Britain for each year since 2000.
6
OLAP
• A data warehouse implementation without an OLAP
tool is nearly unthinkable.
• Data warehousing and on-line analytical processing
(OLAP) are essential elements of decision support
• OLAP database servers use multi-dimensional
structures to store data and relationships between
data.
• Multi-dimensional structures are best visualized as
cubes of data, and cubes within cubes of data.
– Each side of a cube is a dimension
7
OLAP Applications
Functional area Examples of OLAP applications
8
Key Features of OLAP
• Described in the OLAP Council White Paper
(2001):
1. Multi-dimensional views of data;
2. Support for complex calculations;
3. Time intelligence.
9
OLAP Tools
• OLAP tools are categorized according to the
architecture used to store and process
multidimensional data
• There are three main categories of OLAP
tools as defined by Berson and Smith (1997)
including
– Relational OLAP (ROLAP)
– Multidimensional OLAP (MOLAP)
– Hybrid OLAP (HOLAP)
10
Relational OLAP
• ROLAP servers are placed between relational back-
end server and client front-end tools.
• To store and manage warehouse data, ROLAP uses
relational or extended-relational DBMS.
• This facilitates the creation of multiple multi-
dimensional views of the two-dimensional relation.
• ROLAP includes the following:
– Implementation of aggregation navigation logic.
– Optimization for each DBMS back-end.
– Additional tools and services.
11
Architecture for ROLAP tools
12
Multidimensional OLAP
• MOLAP tools use specialized data structures
and multi-dimensional database management
systems (MDDBMSs) to organize, navigate,
and analyze data
• To enhance query performance the data is
typically aggregated and stored according to
predicted usage.
• MOLAP data structures use array technology
and efficient storage techniques
13
Architecture for MOLAP tools.
14
Multi-Dimensional Data
Three-field table
Two-dimensional matrix
Four-field Table Three dimensional Cube.
16
Four-field Table Three dimensional Cube.
17
The MOLAP Cube
dimensions = 2
CS 336 18
3-D Cube
Fact table view: Multi-dimensional cube:
dimensions = 3
CS 336 19
Example
roll-up to region
Dimensions:
NY
ore SF
Time, Product, Store
St roll-up to brand
Attributes:
LA
Product (upc, price, …)
Juice 10
Store …
Product
Milk 34
56
…
Coke
32
Hierarchies:
Cream
12 Product Brand …
Soap
Bread 56 roll-up to week Day Week Quarter
M T W Th F S S Store Region Country
Time
56 units of bread sold in LA on M
CS 336 20
Hybrid OLAP
• Hybrid OLAP is a combination of both ROLAP and
MOLAP.
• Hybrid OLAP (HOLAP) tools provide limited analysis
capability, either directly against RDBMS products, or
by using an intermediate MOLAP server
• It offers higher scalability of ROLAP and faster
computation of MOLAP.
• HOLAP servers allow to store large data volumes of
detailed information.
• The aggregations are stored separately in MOLAP store
21
Architecture for HOLAP Tools
22
MOLAP vs ROLAP
MOLAP ROLAP
Maintains a separate database for data It may not require space other than
cubes. available in the data warehouse.
23
OLAP Operations
• Since OLAP servers are based on
multidimensional view of data
• Here is the list of OLAP operations:
– Roll-up
– Drill-down
– Slice and dice
– Pivot (rotate)
24
Roll-up
• Roll-up performs aggregation on a data cube in
any of the following ways:
– By climbing up a concept hierarchy for a dimension
– By dimension reduction
• Roll-up is performed by climbing up a concept
hierarchy for the dimension location.
• When roll-up is performed, one or more
dimensions from the data cube are removed.
25
Initially the concept hierarchy was "street < city < province < country".
On rolling up, the data is aggregated by ascending the location hierarchy
from the level of city to the level of country.
The data is grouped into cities rather than countries 26
Drill-down
• Drill-down is the reverse operation of roll-up.
It is performed by either of the following ways:
– By stepping down a concept hierarchy for a
dimension
– By introducing a new dimension
• Drill-down is performed by stepping down a
concept hierarchy for the dimension time.
• It navigates the data from less detailed data to
highly detailed data.
27
•Initially the concept hierarchy was "day < month < quarter < year."
•On drilling down, the time dimension is descended from the level of quarter to the
level of month.
•When drill-down is performed, one or more dimensions from the data cube are
added.
28
Aggregates
· Add up amounts for day 1
· In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
CS 336 29
Aggregates
· Add up amounts by day
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
CS 336 30
Another Example
· Add up amounts by day, product
· In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 s1 1 12 sale prodId date amt
p2 s1 1 11 p1 1 62
p1 s3 1 50 p2 1 19
p2 s2 1 8
p1 s1 2 44 p1 2 48
p1 s2 2 4
rollup
drill-down
CS 336 31
Cube Aggregation: Roll-up
Example: computing sums
s1 s2 s3
day 2 p1 44 4
...
p2 s1 s2 s3
day 1 p1 12 50
p2 11 8
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129
sum
rollup
p1 110
p2 19
drill-down
CS 336 32
Cube Operators for Roll-up
s1 s2 s3
day 2 p1 44 4
...
p2 s1 s2 s3
day 1 p1 12 50
p2 11 8 sale(s1,*,*)
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129
sum
sale(s2,p2,*) p1 110
p2 19 sale(*,*,*)
CS 336 33
Extended Cube
* s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
day 2
*
s1 67
s2 12
s3 *50 129
p1 44 4 48
p2
s1 s2 s3 *
day 1 * 44 4 48 sale(*,p2,*)
p1 12 50 62
p2 11 8 19
* 23 8 50 81
CS 336 34
Aggregation Using Hierarchies
s1 s2 s3 store
day 2 p1 44 4
p2 s1 s2 s3
day 1 p1 12 50 region
p2 11 8
country
region A region B
p1 56 54
p2 11 8
(store s1 in Region A;
stores s2, s3 in Region B)
CS 336 35
Slice
• The slice operation selects one particular dimension
from a given cube and provides a new sub-cube.
36
• Here Slice is performed for the dimension
"time" using the criterion time = "Q1".
• It will form a new sub-cube by selecting one
or more dimensions.
37
Slicing
s1 s2 s3
day 2 p1 44 4
p2 s1 s2 s3
day 1 p1 12 50
p2 11 8
TIME = day 1
s1 s2 s3
p1 12 50
p2 11 8
CS 336 38
Dice
• Dice selects two or more dimensions from a given
cube and provides a new sub-cube.
39
• dice operation on the cube based on the
following selection criteria involves three
dimensions.
– (location = "Toronto" or "Vancouver")
– (time = "Q1" or "Q2")
– (item =" Mobile" or "Modem")
40
Pivot
41
Consider the following diagram that shows the pivot operation
42
Sales
Slicing & ($ millions)
Products Time
Pivoting Store s1 Electronics
d1
$5.2
d2
Toys $1.9
Clothing $2.3
Cosmetics $1.1
Store s2 Electronics $8.9
Toys $0.75
Clothing $4.6
Cosmetics $1.5
Sales
($ millions)
Products d1
Store s1 Store s2
Store s1 Electronics $5.2 $8.9
Toys $1.9 $0.75
Clothing $2.3 $4.6
Cosmetics $1.1 $1.5
Store s2 Electronics
Toys
Clothing
CS 336 43
Summary of Operations
• Aggregation (roll-up)
– aggregate (summarize) data to the next higher
dimension element
– e.g., total sales by city, year total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a sub-cube
– e.g., sales where city =‘Vancouver’ and date = ‘1/15/90’
• Visualization operations (e.g., Pivot)
CS 336 44