Documente Academic
Documente Profesional
Documente Cultură
2
On-Line Analytical Processing (OLAP)
Too m l it t le
uc h
Data! Too ht!
Insig
Bad
Decisions!
Too
ng Late
o !
Wr text!
n
Co
4
OLAP?
The name On-Line Analytical Processing was coined in a
paper by E.F. Codd in 1993 (“Providing On-Line Analytical
Processing for User Analysts”)
A definition
OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into data
through fast, consistent, interactive access in a a wide variety
of possible views of information that has been transformed
from raw data to reflect the real dimensionality of the
enterprise as understood by the user
5
What is OLAP ??
The OLAP council defines Online Analytical
Processing as “ A category of software technology that
enables analysis managers and executes to gain insight
into data through fast, consistent, interactive access to a
wide variety of possible views of information that have
been transferred from raw data to reflect the real
dimensionally of the enterprise as understand by users.
6
What is OLAP ??
Sales
Marketing
Management reporting
Budgeting
Forecasting
Financial reporting
and similar areas.
7
Need of Data Warehousing and OLAP
Data
Warehousing
OLAP
8
Multi-Tiered Architecture
Monitor
& OLAP Server
other Metadata
source Integrator
s Analysis
Operational Extract Query
Transform Data Serve Reports
DBs Load
Refresh
Warehouse Data mining
Data Marts
9
On-Line Analytical Processing (OLAP)
Front-end to the data warehouse. Allowing easy data manipulation
10
OLAP Analysis
The ability to analyze metrics in different dimensions
such as time, geography, gender, product, etc
For example, sales for the company is up .
What region is most responsible for this increase?
Which store in this region is most responsible for the increase?
What particular product category or categories contributed the most to
the increase?
11
OLAP Analysis
A query that doesn't require OLAP:
• How many bread did we sell last month?
How much large size bran bread did we sell last month in the
Midwest, the Northeast, and the Southeast, compared with the same
month last year, actual vs. budget?
What are the top 25 brands, by products, styles, and regions, for this
period for total US based on sales rupees?
How many promotional expenses did we spend on customers who
purchased less than 1000 Rs. worth of products?
12
What is OLAP ??
Analytical • Drill-Down
Processing • Slice-n’-Dice
• Natural to users
Set of
functionalities
• ‘Cubes’ of data that facilitate
multi-
dimensional
data analysis
for faster,
more informed
decision making
13
What is OLAP ??
Analytical • Drill-Down
Processing • Slice-n’-Dice
• Natural to users
Set of
functionalities
• ‘Cubes’ of data that facilitate
• Data
multi-
• Detection
dimensional • Processes
data analysis
• for faster, • Relations
Measurement
more informed
• Concepts
• Comparison decision making
14
OLAP Goals
Faster
Cognition
DATA
Better
Comprehension
• Intensive
• Multi-Dimensional OLAP
TOOLS Better
• Variety of Communication
Relationships
• Complex Situations
15
OLTP versus OLAP
OLTP OLAP
User Clerk, IT Professional Knowledge Worker
Day-to-day Operations Decision Support
Function
Database Design Application-oriented Subject-oriented
(E-R based) (Star, Snowflake)
Current, Isolated Historical, Consolidated
Data Detailed, Flat Relational Summarized, Multidimensional
View Structured, Repetitive Ad-Hoc
Usage Short, Simple Transaction Complex Query
Unit of Work Read / Write Read Mostly
Access Index/Hash on Prim. Key Lots of Scans
Tens Millions
Operations
Thousands Hundreds
# Records 100s GB-TB
100s MB-GB
Accessed
Transaction Throughput Query Throughput, Response
# Users
Database Size
Performance
Metric
16
F. A. S. M. I. Concept
FAST : Delivers most responses to users within 5
seconds.
•Client-based OLAP
•Complete products.
applications or very •Easy to deploy.
functional, •Low cost per seat.
complete toolkits. •Limited
•Typically aimed at functionality and
specific vertical or capacity.
horizontal markets.
•High cost per seat.
19
OLAP MarketOLAP
Shares
M arket Share
2.1%
15.2%
2.4% 21.3%
2.5%
2.5%
6.6%
21.3%
6.9%
7.1%
12.1%
Time Warner
Sales/Marketing
Forecasting and Analysis
Supply/Demand Forecasting
Support users in three
Market Analysis
continents with a strategic
Customer Analysis market planning and
Market Segmentation analysis system using an
OLAP database server
Promotions Analysis
21
OLAP Success Stories
22
OLAP
23
Strengths of OLAP
24
OLAP Applications
25
On-Line Analytical Processing (OLAP)
26
Functionality
OALP Cube:
Cubes are data processing units composed of fact tables and dimensions from the
data warehouse.
They provide multidimensional views of data, querying and analytical capabilities
to clients
27
Conceptual Model
TV
od
PC U.S.A
Pr
PVR
Country
sum
Canada
Mexico
sum
ALL
28
OLAP: Data Cube
OLAP takes a snapshot of a set of source data and restructures it into an
OLAP cube
The cube is created from a star schema or snowflake schema of tables
Overall sales of
TV’s in the US
Date in 3rd quarter
t
1Qtr 2Qtr 3Qtr 4Qtr sum
uc
TV
od
PC U.S.A
Pr
VCR
Country
sum
Canada
Mexico
sum
29
Typical OLAP Operations
Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up
from higher level summary to lower level summary or detailed data,
or introducing new dimensions
Slice and dice:
project and select
Other operations
drill across: involving (across) more than one fact table
drill through: through the bottom level of the cube to its back-end
relational tables (using SQL)
30
OLAP: Data Cube Operations
Slicing:
Selecting the dimensions of the cube to be viewed.
Example: View “Sales volume” as a function of “Product ” by “Country “by
“Quarter”
Dicing:
Specifying the values along one or more dimensions.
Example: View “Sales volume” for “Product=PC” by “Country “by “Quarter”
31
Slicing and Dicing
Red
Red
Blue
Blue WA
OR OR
Gray
CA CA WA
Jul Aug Sep
Gray Jul Aug Sep
WA
Blue
Total OR
Blue
Jul Aug Sep CA
Jul Aug Sep
32
Querying the Data Cube
Cross-tabulation
“Cross-tab” for short
Number of Autos Sold
Report data grouped by 2
dimensions CA OR WA Total
Aggregate across other dimensions
Jul 45 33 30 108
Include subtotals
Aug 50 36 42 128
Operations on a cross-tab
Roll up (further aggregation) Sep 38 31 40 109
Drill down (less aggregation)
Total 133 100 112 345
33
Roll Up and Drill Down
Red 40 29 40 109
Blue 45 31 37 113
Gray 48 40 35 123
Total 133 100 112 345
34
“Dicing” - Ranging of Data
Sales Volumes
Coupe Clyde
M Mini Van
O Blue White
D Coupe
E “Diced” Data
L Sedan
Carr
Gleason
Clyde
DEALERSHIP
Blue Red White
COLOR
35
“Slicing” - Rotation of Data
M Van M
Van
o o
d Coupe d Coupe
e e Sedan
Sedan
l l
Blue Red White Miller Clyde Smith
Color Dealership
36
“Standard” Data Cube Query
Measurements
Which fact(s) should be reported?
Filters
What slice(s) of the cube should be used?
Grouping attributes
How finely should the cube be diced?
Each dimension is either:
(a) A grouping attribute
(b) Aggregated over (“Rolled up” into a single total)
n dimensions → 2n sets of grouping attributes
Aggregation = projection to a lower-dimensional subspace
37
Full Data Cube with Subtotals
Pre-computation of aggregates → fast answers to OLAP queries
Ideally, pre-compute all 2n types of subtotals
Otherwise, perform aggregation as needed
38
OLAP: Data Cube Operations
39
Data Warehouse Usage
40
Three Data Warehouse Models
Enterprise warehouse
collects all of the information about subjects spanning the entire
organization
Data Mart
a subset of corporate-wide data that is of value to a specific groups of
users. Its scope is confined to specific, selected groups, such as marketing
data mart
Independent vs. dependent (directly from warehouse) data mart
41
Typical OLAP Operations
42
Views and Decision Support
43
OLAP Terminology
A data cube supports viewing/modelling of a variable (a set of
variables) of interest. Measures are used to report the values of
the particular variable with respect to a given set of dimensions.
A fact table stores measures as well as keys representing
relationships to various dimensions.
Dimensions are perspectives with respect to which an
organization wants to keep record.
A star schema defines a fact table and its associated
dimensions.
Remark: OLAP technology is frequently used in data warehouses
44
Where Does OLAP Fit In? (1)
45
Where Does OLAP Fit In? (2)
Information is conceptually viewed as “cubes” for simplifying the
way in which users access, view, and analyze data.
Quantitative values are known as “facts” or “measures.”
e.g., sales $, units sold, etc.
Descriptive categories are known as “dimensions.”
46
OLAP FASMI Test
Fast: Delivers information to the user at a fairly constant rate. Most queries
should be delivered to the user in five seconds or less.
Analysis: Performs basic numerical and statistical analysis of the data, pre-
defined by an application developer or defined ad hoc by the user.
Shared: Implements the security requirements necessary for sharing
potentially confidential data across a large user population.
Multi-dimensional: The essential characteristic of OLAP.
Information: Accesses all the data and information necessary and relevant
for the application, wherever it may reside and not limited by volume.
...from the OLAP Report by Pendse and Creeth.
47
Need for Multidimensional Analysis
A simple analysis
Howmany units of product A did we sell in the store in
DHA, Lahore
Typically, decision support requires more complex
analyses
How much revenue did the new product X generate during the last
three months, broken down by individual months, in the Southern
Region, by individual stores, broken down by the promotions,
compared to estimates, and compared to the previous version of the
the product?
48
Kinds of Analyses
Roll-ups to provide summaries and aggregates along
the hierarchies of the dimensions
Drill-downs from the top level to the lowest along the
hierarchies of the dimensions
Calculations involving facts and metrics
Algebraic equations involving key performance
indicators
Moving averages and growth percentages
Trend analyses using statistical methods
49
50
51
OLAP?
The name On-Line Analytical Processing was coined
in a paper by E.F. Codd in 1993 (“Providing On-Line
Analytical Processing for User Analysts”)
A definition
OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into data
through fast, consistent, interactive access in a a wide variety
of possible views of information that has been transformed
from raw data to reflect the real dimensionality of the
enterprise as understood by the user
52
OLAP Features
53
Dimensional Analysis (1)
54
Dimensional Analysis (2)
55
Some Queries
Display the total sales of all products for past five years
in all stores
Compare total sales for all stores, product by product,
between years 2000 and 1999.
Show comparison of sales by individual stores, product
by product, between years 2000 and 1999 only for
those products with reduced sales.
Show the results of the previous queries, but rotating
the columns with rows
56
Hypercubes
Multi-dimension cubes
Hard to visualize and display beyond three dimensions
Multi-dimensional domain structure (MDS)
Represents each dimension as a line showing the values
57
MDS
58
Display of Hypercubes
59
60
61
Drill-Down and Roll-Up
62
Slice-and-Dice or Rotation
63
OLAP Models/Implementations
MOLAP: OLAP implemented with a multi-dimensional
database.
64
ROLAP and MOLAP
65
MOLAP Implementations
OLAP has historically been implemented through use of multi-
dimensional databases (MDDs).
Dimensions are key business factors for analysis:
geographies (zip, state, region,...)
products (item, product category, product department,...)
dates (day, week, month, quarter, year,...)
Very high performance via fast look-up into “cube” data structure
to retrieve pre-calculated results.
“Cube” data structures allow pre-calculation of aggregate results
for each possible combination of dimensional values.
Use of application programming interface (API) for access via
front-end tools.
66
67
MOLAP Implementations
68
MOLAP Implementations
Typically outperform relational database technology because all answers are pre-
computed into cubes (and overhead for accessing cubes is very low).
Difficult to scale because of combinatorial explosion in the number and size of cubes
when dimensions of significant cardinality are required.
Beyond tens (sometimes small hundreds) of thousands of entries in a single dimension
will break the MOLAP model because the pre-computed cube model does not work
well when the cubes are very sparse in the population of individual cells.
See www.olapreport.com/DataExplosion.htm
69
Virtual Cubes
Virtual cubes are used when there is a need to join information
from two dissimilar cubes that share one or more common
dimensions.
Similar to a relational view; two (or more) cubes are linked
along common dimension(s).
Often used to save space by eliminating redundant storage of
information.
70
Partitioned Cubes
One logical cube of data can be spread across
multiple physical cubes on separate (or same)
servers.
The divide-and-conquer approach of partitioned
cubes helps to mitigate the scalability limitations of
a MOLAP environment.
Ideal cube partitioning is completely invisible to
end users.
71
ROLAP Implementations
72
73
Simplified Third Normal Form (Retail)
ZONE REGION
1
M zip _x_SMSA zip _x_adi year
ZIP ZONE ZIP SMSA ZIP ADI QTR YR
1 1 1 quarter M 1
M M M
WEEK QTR
STORE # ADDRESS ZIP ...
1
1 week M
1
M M DATE WEEK
RECEIPT # STORE # DATE ...
1 M M 1
1 STORE # DATE WEATHER
M
ITEM # RECEIPT # ... $
M 1 M sale_detail
1
ITEM # CATEGORY
ITEM # MFCTR
item_x_category M
1 item_x_mfctr
CATEGORY DEPT
category_x_dept
74
Simplified Star Schema
Fact Table M M
ITEM# RECEIPT# STORE# DATE ... $
M M M
1 1 1
ITEM# CATEGORY DEPT MFCTR ... STORE# DATE WEATHER
A vastly simplified model ... may even summarize out receipt # .....
75
Simplified Star Schema
A vastly simplified physical data model!
76
Star Schema for High Performance
Assume:
4 Billion rows in fact table.
20 different kinds (size, color, style) of raincoats
(product category) out of 50,000 UPCs in store.
8 stores out of 400 are in BOSTON SMSA.
2 years of POS history in DBMS.
77
Star Schema for High Performance
Simple (poor performance) approach to query execution:
78
Star Schema for High Performance
79
Forcing a Cartesian Product Join
80
Forcing a Cartesian Product Join
Sample code:
select sum(sales.sales_amt)
from d_sales_detail
,store
,item
,period
where d_sales_detail.store_id = store.store_id
and d_sales_detail.item_id = item.item_id
and d_sales_detail.day_dt = period.day_dt
and period.day_dt between '23-NOV-2000' and '24-DEC-2000'
and item.trade_style_cd = 'BARBIE'
and store.state_cd = 'CA'
and store.join_value = period.join_value
and store.join_value = item.join_value
and period.join_value = item.join_value
;
81
Star Schema for High Performance
Problem: What if I want to know raincoat sales in first week of
January regardless of store?
82
Star Schema for High Performance
Bottom Line:
Itis not at all unusual to obtain an order of
magnitude (or more) in performance advantage
using a star schema with advanced indexing versus
a more traditional relational database
implementation.
Despite what vendors may tell you, star schemas
cannot be effectively implemented for all DSS
business applications and/or data models.
83
ROLAP
RelationalOLAP often makes heavy use of
summary tables to provide near instantaneous
access for multi-dimensional queries.
Foundationis usually star schema or
snowflake database design.
Allows OLAP with much larger data sets than
multi-dimensional database (MDD) products
using cube structures (MOLAP).
84
ROLAP
Number of summary tables can get very large if
discipline is not enforced...
85
ROLAP
Summary tables in a naive implementation require all
combinations of the dimensions at each aggregation
level...
A ll D a y s 1 3 19 24 28 30
Year 9 15 22 27 29
Q u a rte r 6 11 18 23 26
P e rio d 4 8 14 21 25
W eek 2 5 10 17 20
D ay 1 3 7 12 16
S to re Z o n e D is tric t R e g io nA ll S to re s
87
ROLAP
Warning: Do not assume that dimensions are always simple hierarchies.
Example: Items are not just category, subcategory, SKU, and atomic
item.... what about trade styles or manufacturer?
Calendar vs. accounting period vs. billing cycle can be even worse...
88
ROLAP
89
Intelligent Aggregation Selection
Maximum performance boost implies lots of disk
for every pre-calculation.
Minimum performance boost implies no disk with
zero pre-calculation.
Strategy is to use meta data to heuristically
determine optimum set of aggregates from which all
other aggregates can be derived.
90
Aggregate Wizards
91
Fact Table Aggregates
Enhance performance on common queries at
coarser granularities.
Save space to permit storing more history than
possible with finer granularities.
Take advantage of need to store other facts
(with similar samples) at a particular
granularity.
92
Aggregate Advice
Coarser granularity decreases potential
cardinality, but usually increases density (e.g.,
daily summary table is typically twice the size
of weekly summary table - not seven times).
Strongly consider omitting candidate aggregates
where expected cardinality is more than 10%
that of next finer granularity stored.
Keep the detail for drill down, even if you
deploy aggregates for performance.
93
Bottom Line
There are many implementation techniques for
delivery of an OLAP environment.
94
MOLAP Vs. ROLAP
95
Implementation Issues
Data design and preparation
Administration and performance
OLAP platforms
OLAP tools and products
96
Data Design and Preparation
Characteristics of data
Stores and uses much less data compared to a DW
Data is summarized. You will rarely find data at the lowest
levels of detail as in the DW
Data is more flexible for processing and analysis partly
because there is much less data to work with
Every instance of the OLAP system in your environment is
customized for the purpose that instance serves
OLAP data is generally customized
Types and levels
Static
and dynamic summary data
Permanent and transient detailed data
97
Administration
Administering and managing OLAP systems should be
handled with that of the DW environment
Some considerations
Expectations on what data would be accessed and how
Selection of the right business dimensions
Selection of the right filters in loading data from the DW
Choosing the aggregation, summarization, and precalculation
Size of the multidimensional database
Access and security privileges
Backup and restore facilities
Drill-through to the data warehouse; drill-through to another
OLAP instance
98
Performance
OLAP takes most of the queries that normalling would
run against the DW
OLAP is designed for complex queries; so it should
enhance overall query performance of the DW
environment
OLAP can precalculate and pre-aggregate data for
quick response
99
OLAP Platforms
Usually, the data warehouse and OLAP systems reside
on the same platform in the start. Later when the data
warehouse becmes large and OLAP is a common task,
OLAP system is moved to another platform
A separate platform is needed when
The size and usage of the DW consumes all resources
Many departmental users desire OLAP capabilities
The stability and performance of OLAP degrades
OLAP tools require a different platform configuration than
the DW
100
Virtual Cubes
Virtual cubes are used when there is a need to join information from two
dissimilar cubes that share one or more common dimensions.
Similar to a relational view; two (or more) cubes are linked along
common dimension(s).
Often used to save space by eliminating redundant storage of
information.
Example: Build a list price cube that can be used to compute discounts
given across many stores in a retail chain without redundant storage of
the list price data through use of a virtual cube.
101
Partitioned Cubes
102
Performance
OLAP takes most of the queries that normalling would
run against the DW
OLAP is designed for complex queries; so it should
enhance overall query performance of the DW
environment
OLAP can precalculate and pre-aggregate data for
quick response
103
OLAP Platforms
Usually, the data warehouse and OLAP systems reside on the
same platform in the start. Later when the data warehouse
becomes large and OLAP is a common task, OLAP system is
moved to another platform
A separate platform is needed when
The size and usage of the DW consumes all resources
Many departmental users desire OLAP capabilities
The stability and performance of OLAP degrades
OLAP tools require a different platform configuration than
the DW
104
OLTP vs. OLAP
OLTP OLAP
User Clerk, IT Knowledge worker
professional
Function Day to day operations Decision support
107