Sunteți pe pagina 1din 107

OLAP

CS 524 – Data Warehousing


Outline of Today’s Class
 Basic OLAP queries
 Data cubes
 Slice and dice, drill down, roll up
 MOLAP vs. ROLAP
 SQL OLAP Extensions: ROLLUP, CUBE

2
On-Line Analytical Processing (OLAP)

Analysis is simplifying, breaking down things into parts, picking out


strands and elements. Analysis is comparing unknown things with things
that are known. Analysis also involves picking out relationships and
putting them back together as a whole. – Edward de Bono
3
The Work Environment Today…

Too m l it t le
uc h
Data! Too ht!
Insig

Bad
Decisions!

Too
ng Late
o !
Wr text!
n
Co

4
OLAP?
 The name On-Line Analytical Processing was coined in a
paper by E.F. Codd in 1993 (“Providing On-Line Analytical
Processing for User Analysts”)
 A definition
 OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into data
through fast, consistent, interactive access in a a wide variety
of possible views of information that has been transformed
from raw data to reflect the real dimensionality of the
enterprise as understood by the user

5
What is OLAP ??
 The OLAP council defines Online Analytical
Processing as “ A category of software technology that
enables analysis managers and executes to gain insight
into data through fast, consistent, interactive access to a
wide variety of possible views of information that have
been transferred from raw data to reflect the real
dimensionally of the enterprise as understand by users.

6
What is OLAP ??

Typical applications of OLAP are in business


reporting for

Sales
Marketing
Management reporting
Budgeting
Forecasting
Financial reporting
and similar areas.
7
Need of Data Warehousing and OLAP

Data
Warehousing

Decision support requires historical Decision support requires


data which operational Databases consolidation (aggregation,
do not typically maintain summarization) of data

OLAP

Unacceptable Performance while Multidimensional data model is not


execution of complex OLAP queries supported by DBMS

8
Multi-Tiered Architecture
Monitor
& OLAP Server
other Metadata
source Integrator
s Analysis
Operational Extract Query
Transform Data Serve Reports
DBs Load
Refresh
Warehouse Data mining

Data Marts

Data Sources Data Storage OLAP Engine Front-End Tools

9
On-Line Analytical Processing (OLAP)
Front-end to the data warehouse. Allowing easy data manipulation

Allows conducting inquiries over the data at various levels of


abstractions

Fast and easy because some aggregations are computed in advance

It is an approach to quickly provide the answer to analytical queries


that are dimensional in nature

No need to formulate entire query

10
OLAP Analysis
 The ability to analyze metrics in different dimensions
such as time, geography, gender, product, etc
 For example, sales for the company is up .
 What region is most responsible for this increase?
 Which store in this region is most responsible for the increase?
 What particular product category or categories contributed the most to
the increase?

 Answering these types of questions in order means that you


are performing an OLAP analysis.

11
OLAP Analysis
 A query that doesn't require OLAP:
• How many bread did we sell last month?

 Queries that require OLAP:

 How much large size bran bread did we sell last month in the
Midwest, the Northeast, and the Southeast, compared with the same
month last year, actual vs. budget?
 What are the top 25 brands, by products, styles, and regions, for this
period for total US based on sales rupees?
 How many promotional expenses did we spend on customers who
purchased less than 1000 Rs. worth of products?

12
What is OLAP ??

On-Line • Multiple views

Analytical • Drill-Down

Processing • Slice-n’-Dice
• Natural to users
Set of
functionalities
• ‘Cubes’ of data that facilitate
multi-
dimensional
data analysis
for faster,
more informed
decision making
13
What is OLAP ??

On-Line • Multiple views

Analytical • Drill-Down

Processing • Slice-n’-Dice
• Natural to users
Set of
functionalities
• ‘Cubes’ of data that facilitate
• Data
multi-
• Detection
dimensional • Processes
data analysis
• for faster, • Relations
Measurement
more informed
• Concepts
• Comparison decision making
14
OLAP Goals

Faster
Cognition

DATA
Better
Comprehension
• Intensive

• Multi-Dimensional OLAP
TOOLS Better
• Variety of Communication
Relationships

• Complex Situations

• Context / Focus Better


Decision-Making

15
OLTP versus OLAP

OLTP OLAP
User  Clerk, IT Professional  Knowledge Worker
 Day-to-day Operations  Decision Support
Function
Database Design  Application-oriented  Subject-oriented
(E-R based) (Star, Snowflake)
 Current, Isolated  Historical, Consolidated
Data  Detailed, Flat Relational  Summarized, Multidimensional
View  Structured, Repetitive  Ad-Hoc
Usage  Short, Simple Transaction  Complex Query
Unit of Work  Read / Write  Read Mostly
Access  Index/Hash on Prim. Key  Lots of Scans
 Tens  Millions
Operations
 Thousands  Hundreds
# Records  100s GB-TB
 100s MB-GB
Accessed
 Transaction Throughput  Query Throughput, Response
# Users
Database Size
Performance
Metric

16
F. A. S. M. I. Concept
 FAST : Delivers most responses to users within 5
seconds.

 ANALYSIS : Copes with relevant business logic and


statistical analyses, yet is simple enough for target
user.

 SHARED : Handles multiple updates securely and


quickly.

 MULTIDIMENSIONAL : Provides conceptual view of


data along several dimensions.
17
OLAP Flavors
•Unbundled, high performance
multidimensional or hybrid
databases (HOLAP).
•Include multi-user data
updating.

•Client-based OLAP
•Complete products.
applications or very •Easy to deploy.
functional, •Low cost per seat.
complete toolkits. •Limited
•Typically aimed at functionality and
specific vertical or capacity.
horizontal markets.
•High cost per seat.

•All data & metadata in stored in


a standard RDBMS.
•Can handle large data volumes.
•Complex & expensive to
implement.
•Slow query performance. 18
Major OLAP Vendors

19
OLAP MarketOLAP
Shares
M arket Share

2.1%
15.2%
2.4% 21.3%
2.5%
2.5%
6.6%
21.3%
6.9%
7.1%
12.1%

Hyperion Solutions Microsoft


Cognos Oracle
Micro Strategy Business Objects
Applix Cartesis/PwC
Comshare IBM
Other 20
OLAP Success Stories

Time Warner
 Sales/Marketing
 Forecasting and Analysis
 Supply/Demand Forecasting
Support users in three
 Market Analysis
continents with a strategic
 Customer Analysis market planning and
 Market Segmentation analysis system using an
OLAP database server
 Promotions Analysis

21
OLAP Success Stories

Barclay’s Bank Used OLAP to manage risk


individually in real time on
loans based on ever-
changing factors, internal
and external.
 Finance/Accounting "Access to this information
 Budgeting gives the salespeople a
tremendous advantage
 Activity-Based Costing because they don't need to
worry about a mortgage
 Financial Performance being declined after an offer
 Customer/Product Profitability has been made. They know
exactly what the customer's
limit is."
-Pankaj Mistry, data
warehouse manager

22
OLAP

 Nature of OLAP Analysis


 Aggregation -- (total sales, percent-to-total)
 Comparison -- Budget vs. Expenses
 Ranking -- Top 10, quartile analysis
 Access to detailed and aggregate data
 Complex criteria specification
 Visualization
 Need interactive response to aggregate queries

23
Strengths of OLAP

 It is a powerful visualization tool


 It provides fast, interactive response
times
 It is good for analyzing time series
 It can be useful to find some clusters
and outliners
 Many vendors offer OLAP tools

24
OLAP Applications

 Marketing & Sales Analysis


 Consumer Goods Industries, Retailers
 Financial Services (Banks, Insurance etc.)

 Clickstream Analysis & Web Analytics


 Pure Play E-commerce Sites
 Click-n’-Mortar Organizations

 Database Marketing & CRM


 Customer Segmentation
 Customer Value Analysis

25
On-Line Analytical Processing (OLAP)

 Front-end to the data warehouse. Allowing easy data


manipulation

 Allows conducting inquiries over the data at various levels


of abstractions

Fast and easy because some aggregations are computed in


advance
No need to formulate entire query

26
Functionality

 OLAP takes a snapshot of a set of source data and


restructures it into an OLAP cube
 For complex queries OLAP can produce an answer in
around 0.1% of the time for the same query on OLTP
relational data.
 The cube is created from a star schema or
snowflake schema of tables

OALP Cube:
Cubes are data processing units composed of fact tables and dimensions from the
data warehouse.
They provide multidimensional views of data, querying and analytical capabilities
to clients
27
Conceptual Model

Date Total annual sales


of TV in U.S.A.
1 2 3 4 sum
t
uc

TV
od

PC U.S.A
Pr

PVR

Country
sum
Canada

Mexico

sum

ALL

28
OLAP: Data Cube
OLAP takes a snapshot of a set of source data and restructures it into an
OLAP cube
The cube is created from a star schema or snowflake schema of tables

Overall sales of
TV’s in the US
Date in 3rd quarter
t
1Qtr 2Qtr 3Qtr 4Qtr sum
uc

TV
od

PC U.S.A
Pr

VCR

Country
sum
Canada

Mexico

sum

29
Typical OLAP Operations
 Roll up (drill-up): summarize data
 by climbing up hierarchy or by dimension reduction
 Drill down (roll down): reverse of roll-up
 from higher level summary to lower level summary or detailed data,
or introducing new dimensions
 Slice and dice:
 project and select
 Other operations
 drill across: involving (across) more than one fact table
 drill through: through the bottom level of the cube to its back-end
relational tables (using SQL)

30
OLAP: Data Cube Operations

 Slicing:
Selecting the dimensions of the cube to be viewed.
 Example: View “Sales volume” as a function of “Product ” by “Country “by
“Quarter”

 Dicing:
Specifying the values along one or more dimensions.
 Example: View “Sales volume” for “Product=PC” by “Country “by “Quarter”

31
Slicing and Dicing

Red

Red
Blue
Blue WA
OR OR
Gray
CA CA WA
Jul Aug Sep
Gray Jul Aug Sep

WA
Blue
Total OR
Blue
Jul Aug Sep CA
Jul Aug Sep

32
Querying the Data Cube
 Cross-tabulation
 “Cross-tab” for short
Number of Autos Sold
 Report data grouped by 2
dimensions CA OR WA Total
 Aggregate across other dimensions
Jul 45 33 30 108
 Include subtotals
Aug 50 36 42 128
 Operations on a cross-tab
 Roll up (further aggregation) Sep 38 31 40 109
 Drill down (less aggregation)
Total 133 100 112 345

33
Roll Up and Drill Down

Number of Autos Sold


Number of Autos Sold
CA OR WA Total
CA OR WA Total
Jul 45 33 30 108
133 100 112 345
Aug 50 36 42 128
Roll up
Sep 38 31 40 109 by Month Drill down
Total 133 100 112 345 by Color
Number of Autos Sold
CA OR WA Total

Red 40 29 40 109
Blue 45 31 37 113
Gray 48 40 35 123
Total 133 100 112 345
34
“Dicing” - Ranging of Data

Sales Volumes

Coupe Clyde
M Mini Van

O Blue White
D Coupe
E “Diced” Data
L Sedan
Carr
Gleason
Clyde
DEALERSHIP
Blue Red White

COLOR
35
“Slicing” - Rotation of Data

Rotate the data


cube by 90°

M Van M
Van
o o
d Coupe d Coupe
e e Sedan
Sedan
l l
Blue Red White Miller Clyde Smith

Color Dealership

36
“Standard” Data Cube Query
 Measurements
 Which fact(s) should be reported?
 Filters
 What slice(s) of the cube should be used?
 Grouping attributes
 How finely should the cube be diced?
 Each dimension is either:
 (a) A grouping attribute
 (b) Aggregated over (“Rolled up” into a single total)
 n dimensions → 2n sets of grouping attributes
 Aggregation = projection to a lower-dimensional subspace

37
Full Data Cube with Subtotals
 Pre-computation of aggregates → fast answers to OLAP queries
 Ideally, pre-compute all 2n types of subtotals
 Otherwise, perform aggregation as needed

38
OLAP: Data Cube Operations

 Drilling down: from higher level aggregation to lower level


aggregation or detailed data (Viewing by “state” after viewing by
“region” )

 Rolling-up: Summarize data by climbing up hierarchy or by


dimension reduction (E.g., viewing by “region” instead of by “state”)

39
Data Warehouse Usage

 Three kinds of data warehouse applications


 Information processing
 supports querying, basic statistical analysis, and reporting using
crosstabs, tables, charts and graphs
 Analytical processing and Interactive Analysis
 multidimensional analysis of data warehouse data
 supports basic OLAP operations, slice-dice, drilling, pivoting
 Data mining
 knowledge discovery from hidden patterns
 supports associations, constructing analytical models, performing
classification and prediction, and presenting the mining results using
visualization tools.
 Differences among the three tasks

40
Three Data Warehouse Models
 Enterprise warehouse
 collects all of the information about subjects spanning the entire
organization
 Data Mart
 a subset of corporate-wide data that is of value to a specific groups of
users. Its scope is confined to specific, selected groups, such as marketing
data mart
 Independent vs. dependent (directly from warehouse) data mart

41
Typical OLAP Operations

 Roll up (drill-up): summarize data


 by climbing up hierarchy or by dimension reduction
 Drill down (roll down): reverse of roll-up
 from higher level summary to lower level summary or detailed data, or
introducing new dimensions
 Slice and dice:
 project and select
 Pivot (rotate):
 reorient the cube, visualization, 3D to series of 2D planes.
 Other operations
 drill across: involving (across) more than one fact table
 drill through: through the bottom level of the cube to its back-end relational
tables (using SQL)

42
Views and Decision Support

 OLAP queries are typically aggregate queries.


 Precomputation is essential for interactive response times.
 The CUBE is in fact a collection of aggregate queries, and
precomputation is especially important: lots of work on what is best
to precompute given a limited amount of space to store precomputed
results.
 Warehouses can be thought of as a collection of
asynchronously replicated tables and periodically maintained
views.
 Has renewed interest in view maintenance!

43
OLAP Terminology
 A data cube supports viewing/modelling of a variable (a set of
variables) of interest. Measures are used to report the values of
the particular variable with respect to a given set of dimensions.
 A fact table stores measures as well as keys representing
relationships to various dimensions.
 Dimensions are perspectives with respect to which an
organization wants to keep record.
 A star schema defines a fact table and its associated
dimensions.
Remark: OLAP technology is frequently used in data warehouses

44
Where Does OLAP Fit In? (1)

OLAP = On-line analytical processing.


 OLAP is a characterization of applications, not a database
design technique.
 Idea is to provide very fast response time in order to
facilitate iterative decision-making.
 Analytical processing requires access to complex
aggregations (as opposed to record-level access).

45
Where Does OLAP Fit In? (2)
Information is conceptually viewed as “cubes” for simplifying the
way in which users access, view, and analyze data.
 Quantitative values are known as “facts” or “measures.”
 e.g., sales $, units sold, etc.
 Descriptive categories are known as “dimensions.”

 e.g., geography, time, product, scenario (budget or actual), etc.


 Dimensions are often organized in hierarchies that represent levels of detail
in the data (e.g., UPC, SKU, product subcategory, product category, etc.).

46
OLAP FASMI Test
Fast: Delivers information to the user at a fairly constant rate. Most queries
should be delivered to the user in five seconds or less.
Analysis: Performs basic numerical and statistical analysis of the data, pre-
defined by an application developer or defined ad hoc by the user.
Shared: Implements the security requirements necessary for sharing
potentially confidential data across a large user population.
Multi-dimensional: The essential characteristic of OLAP.
Information: Accesses all the data and information necessary and relevant
for the application, wherever it may reside and not limited by volume.
...from the OLAP Report by Pendse and Creeth.

47
Need for Multidimensional Analysis
 A simple analysis
 Howmany units of product A did we sell in the store in
DHA, Lahore
 Typically, decision support requires more complex
analyses
 How much revenue did the new product X generate during the last
three months, broken down by individual months, in the Southern
Region, by individual stores, broken down by the promotions,
compared to estimates, and compared to the previous version of the
the product?

48
Kinds of Analyses
 Roll-ups to provide summaries and aggregates along
the hierarchies of the dimensions
 Drill-downs from the top level to the lowest along the
hierarchies of the dimensions
 Calculations involving facts and metrics
 Algebraic equations involving key performance
indicators
 Moving averages and growth percentages
 Trend analyses using statistical methods

49
50
51
OLAP?
 The name On-Line Analytical Processing was coined
in a paper by E.F. Codd in 1993 (“Providing On-Line
Analytical Processing for User Analysts”)
 A definition
 OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into data
through fast, consistent, interactive access in a a wide variety
of possible views of information that has been transformed
from raw data to reflect the real dimensionality of the
enterprise as understood by the user

52
OLAP Features

53
Dimensional Analysis (1)

54
Dimensional Analysis (2)

55
Some Queries
 Display the total sales of all products for past five years
in all stores
 Compare total sales for all stores, product by product,
between years 2000 and 1999.
 Show comparison of sales by individual stores, product
by product, between years 2000 and 1999 only for
those products with reduced sales.
 Show the results of the previous queries, but rotating
the columns with rows

56
Hypercubes
 Multi-dimension cubes
 Hard to visualize and display beyond three dimensions
 Multi-dimensional domain structure (MDS)
 Represents each dimension as a line showing the values

57
MDS

58
Display of Hypercubes

59
60
61
Drill-Down and Roll-Up

62
Slice-and-Dice or Rotation

63
OLAP Models/Implementations
MOLAP: OLAP implemented with a multi-dimensional
database.

ROLAP: OLAP implemented with a relational database.

HOLAP: OLAP implemented with a hybrid of multi-


dimensional and relational database technologies.

DOLAP: OLAP implemented for desktop decision


support environments.

64
ROLAP and MOLAP

65
MOLAP Implementations
OLAP has historically been implemented through use of multi-
dimensional databases (MDDs).
 Dimensions are key business factors for analysis:
 geographies (zip, state, region,...)
 products (item, product category, product department,...)
 dates (day, week, month, quarter, year,...)
 Very high performance via fast look-up into “cube” data structure
to retrieve pre-calculated results.
 “Cube” data structures allow pre-calculation of aggregate results
for each possible combination of dimensional values.
 Use of application programming interface (API) for access via
front-end tools.

66
67
MOLAP Implementations

Need to consider both maintenance and storage implications


when designing strategy for when to build cubes.

 Maintenance Considerations: Every data item received


into MDD must be aggregated into every cube (assuming “to-
date” summaries are maintained).

 Storage Considerations: Although cubes get much smaller


(e.g., more dense) as dimensions get less detailed (e.g., year
vs. day), storage implications for building hundreds of cubes
can be significant.

68
MOLAP Implementations

 Typically outperform relational database technology because all answers are pre-
computed into cubes (and overhead for accessing cubes is very low).
 Difficult to scale because of combinatorial explosion in the number and size of cubes
when dimensions of significant cardinality are required.
 Beyond tens (sometimes small hundreds) of thousands of entries in a single dimension
will break the MOLAP model because the pre-computed cube model does not work
well when the cubes are very sparse in the population of individual cells.

See www.olapreport.com/DataExplosion.htm

69
Virtual Cubes
Virtual cubes are used when there is a need to join information
from two dissimilar cubes that share one or more common
dimensions.
 Similar to a relational view; two (or more) cubes are linked
along common dimension(s).
 Often used to save space by eliminating redundant storage of
information.

Example: Build a list price cube that can be used to compute


discounts given across many stores in a retail chain without
redundant storage of the list price data through use of a
virtual cube.

70
Partitioned Cubes
 One logical cube of data can be spread across
multiple physical cubes on separate (or same)
servers.
 The divide-and-conquer approach of partitioned
cubes helps to mitigate the scalability limitations of
a MOLAP environment.
 Ideal cube partitioning is completely invisible to
end users.

71
ROLAP Implementations

Advances in database technologies and front-end tools have begun


to allow deployment of OLAP using ANSI SQL RDBMS
implementations.
 ROLAP facilitates deployment of much larger dimension tables
than MOLAP implementations.
 Front-end tools to facilitate GUI access to multi-dimensional
analysis capabilities.
 Aggregate awareness allows exploitation of pre-built summary
tables for some front-end tools.
Star schema designs are often used to facilitate OLAP against
relational databases.

72
73
Simplified Third Normal Form (Retail)
ZONE REGION
1
M zip _x_SMSA zip _x_adi year
ZIP ZONE ZIP SMSA ZIP ADI QTR YR
1 1 1 quarter M 1
M M M
WEEK QTR
STORE # ADDRESS ZIP ...
1
1 week M
1
M M DATE WEEK
RECEIPT # STORE # DATE ...
1 M M 1
1 STORE # DATE WEATHER
M
ITEM # RECEIPT # ... $
M 1 M sale_detail
1
ITEM # CATEGORY
ITEM # MFCTR
item_x_category M
1 item_x_mfctr
CATEGORY DEPT
category_x_dept

74
Simplified Star Schema

Geography Dimension Table


STORE# ADDRESS ZIP ADI SMSA ZONE REGION
1
Calendar Dimension Table
DATE WEEK QUARTER YEAR ...
1

Fact Table M M
ITEM# RECEIPT# STORE# DATE ... $
M M M

1 1 1
ITEM# CATEGORY DEPT MFCTR ... STORE# DATE WEATHER

Product Dimension Table

A vastly simplified model ... may even summarize out receipt # .....

75
Simplified Star Schema
A vastly simplified physical data model!

Collapse dimensional hierarchies into a single table


for each dimension and create a single fact table
from the header and detail records:
 Fewer tables.
 Fewer joins to get results.

76
Star Schema for High Performance

Business question: How many $ in raincoats did I sell


in the first week of January through stores in
Boston?

Assume:
 4 Billion rows in fact table.
 20 different kinds (size, color, style) of raincoats
(product category) out of 50,000 UPCs in store.
 8 stores out of 400 are in BOSTON SMSA.
 2 years of POS history in DBMS.

77
Star Schema for High Performance
Simple (poor performance) approach to query execution:

1. Join item table with filtering on raincoat product


category (very selective) to fact table.
2. Join date table with filtering by week (next most
selective) to result table.
3. Join store table with filtering on store to result table
from step 2.
4. Aggregate.

78
Star Schema for High Performance

Advanced (better performance) approach to query execution:

1. Cartesian product join between dimensional tables.


* Result is 20 x 8 x 7 = 1,120 rows.

2. Use composite index on item:store:day into fact table for


very selective access.
* Access less than 0.00000008 percent of data in fact table!

Sophisticated cost-based optimizers will figure this out.

79
Forcing a Cartesian Product Join

 Addan addition “join_value” column in each


dimensional table.
 Set
join_value to same value in all rows of the
dimensional tables.
 Addadditional where clause predicates joining on this
column between dimensional tables.

NOTE: This shouldn't be necessary with a “smart” optimizer.

80
Forcing a Cartesian Product Join
Sample code:

select sum(sales.sales_amt)
from d_sales_detail
,store
,item
,period
where d_sales_detail.store_id = store.store_id
and d_sales_detail.item_id = item.item_id
and d_sales_detail.day_dt = period.day_dt
and period.day_dt between '23-NOV-2000' and '24-DEC-2000'
and item.trade_style_cd = 'BARBIE'
and store.state_cd = 'CA'
and store.join_value = period.join_value
and store.join_value = item.join_value
and period.join_value = item.join_value
;

81
Star Schema for High Performance
Problem: What if I want to know raincoat sales in first week of
January regardless of store?

Answer: Performance advantage of composite index in


traditional RDBMS is severely impaired!
 B-tree indexing techniques do not allow for flexibility in the
use of dimensions for query purposes.
 Bit indexing (and variations thereof) often allows much more
generality in achieving high performance from a star schema.

82
Star Schema for High Performance

Bottom Line:
 Itis not at all unusual to obtain an order of
magnitude (or more) in performance advantage
using a star schema with advanced indexing versus
a more traditional relational database
implementation.
 Despite what vendors may tell you, star schemas
cannot be effectively implemented for all DSS
business applications and/or data models.

83
ROLAP
 RelationalOLAP often makes heavy use of
summary tables to provide near instantaneous
access for multi-dimensional queries.
 Foundationis usually star schema or
snowflake database design.
 Allows OLAP with much larger data sets than
multi-dimensional database (MDD) products
using cube structures (MOLAP).

84
ROLAP
Number of summary tables can get very large if
discipline is not enforced...

Assume a retail database with the following two


dimensions on the fact table...
Calendar: Day, Week, Period, Quarter, Year, All Days
Geography: Store, Zone, District, Region, All Stores

85
ROLAP
Summary tables in a naive implementation require all
combinations of the dimensions at each aggregation
level...
A ll D a y s 1 3 19 24 28 30
Year 9 15 22 27 29
Q u a rte r 6 11 18 23 26
P e rio d 4 8 14 21 25
W eek 2 5 10 17 20
D ay 1 3 7 12 16
S to re Z o n e D is tric t R e g io nA ll S to re s

30 summary tables! ... Add in item, SKU, subcategory,


category, and all items...now we are up to 150 pre-
aggregates!
86
ROLAP
Summary tables are more of a maintenance issue than a
storage issue in most production implementations.
 Notice that summary tables get much smaller as dimensions
get less detailed (e.g., year vs. day).
 Should plan for double the size of the unsummarized data for
ROLAP summaries in most environments.
 Every detail record that is received into warehouse must
aggregate into EVERY summary table (assuming "to-date"
summaries are maintained).

87
ROLAP
Warning: Do not assume that dimensions are always simple hierarchies.

Example: Items are not just category, subcategory, SKU, and atomic
item.... what about trade styles or manufacturer?

Now we need summary tables along these lines as well...another 120


summary tables!

Calendar vs. accounting period vs. billing cycle can be even worse...

88
ROLAP

Many ROLAP products have devised ways to reduce the


number of summary tables:

 Ability to build summaries on-the-fly as demanded by end-user


applications.
 Ability to aggregate efficiently from subset of the summary
tables.
 Tools exist in some products to assist in DBAs in selecting the
"best” aggregations to build.
 HOLAP (Hybrid OLAP) tools allow co-existence of pre-built
cubes alongside relational OLAP structures.

89
Intelligent Aggregation Selection
 Maximum performance boost implies lots of disk
for every pre-calculation.
 Minimum performance boost implies no disk with
zero pre-calculation.
 Strategy is to use meta data to heuristically
determine optimum set of aggregates from which all
other aggregates can be derived.

90
Aggregate Wizards

91
Fact Table Aggregates
 Enhance performance on common queries at
coarser granularities.
 Save space to permit storing more history than
possible with finer granularities.
 Take advantage of need to store other facts
(with similar samples) at a particular
granularity.

92
Aggregate Advice
 Coarser granularity decreases potential
cardinality, but usually increases density (e.g.,
daily summary table is typically twice the size
of weekly summary table - not seven times).
 Strongly consider omitting candidate aggregates
where expected cardinality is more than 10%
that of next finer granularity stored.
 Keep the detail for drill down, even if you
deploy aggregates for performance.

93
Bottom Line
 There are many implementation techniques for
delivery of an OLAP environment.

 Must fully consider the performance, scalability,


complexity, and flexibility characteristics when
deciding between MOLAP, ROLAP, and HOLAP.

 Understand your tools and RDBMS!

94
MOLAP Vs. ROLAP

95
Implementation Issues
 Data design and preparation
 Administration and performance
 OLAP platforms
 OLAP tools and products

96
Data Design and Preparation
 Characteristics of data
 Stores and uses much less data compared to a DW
 Data is summarized. You will rarely find data at the lowest
levels of detail as in the DW
 Data is more flexible for processing and analysis partly
because there is much less data to work with
 Every instance of the OLAP system in your environment is
customized for the purpose that instance serves
 OLAP data is generally customized
 Types and levels
 Static
and dynamic summary data
 Permanent and transient detailed data

97
Administration
 Administering and managing OLAP systems should be
handled with that of the DW environment
 Some considerations
 Expectations on what data would be accessed and how
 Selection of the right business dimensions
 Selection of the right filters in loading data from the DW
 Choosing the aggregation, summarization, and precalculation
 Size of the multidimensional database
 Access and security privileges
 Backup and restore facilities
 Drill-through to the data warehouse; drill-through to another
OLAP instance

98
Performance
 OLAP takes most of the queries that normalling would
run against the DW
 OLAP is designed for complex queries; so it should
enhance overall query performance of the DW
environment
 OLAP can precalculate and pre-aggregate data for
quick response

99
OLAP Platforms
 Usually, the data warehouse and OLAP systems reside
on the same platform in the start. Later when the data
warehouse becmes large and OLAP is a common task,
OLAP system is moved to another platform
 A separate platform is needed when
 The size and usage of the DW consumes all resources
 Many departmental users desire OLAP capabilities
 The stability and performance of OLAP degrades
 OLAP tools require a different platform configuration than
the DW

100
Virtual Cubes
Virtual cubes are used when there is a need to join information from two
dissimilar cubes that share one or more common dimensions.
 Similar to a relational view; two (or more) cubes are linked along
common dimension(s).
 Often used to save space by eliminating redundant storage of
information.

Example: Build a list price cube that can be used to compute discounts
given across many stores in a retail chain without redundant storage of
the list price data through use of a virtual cube.

101
Partitioned Cubes

 One logical cube of data can be spread across


multiple physical cubes on separate (or same)
servers.
 The divide-and-conquer approach of partitioned
cubes helps to improve the scalability limitations of
a MOLAP environment.
 Ideal cube partitioning is completely invisible to
end users.

102
Performance
 OLAP takes most of the queries that normalling would
run against the DW
 OLAP is designed for complex queries; so it should
enhance overall query performance of the DW
environment
 OLAP can precalculate and pre-aggregate data for
quick response

103
OLAP Platforms
 Usually, the data warehouse and OLAP systems reside on the
same platform in the start. Later when the data warehouse
becomes large and OLAP is a common task, OLAP system is
moved to another platform
 A separate platform is needed when
 The size and usage of the DW consumes all resources
 Many departmental users desire OLAP capabilities
 The stability and performance of OLAP degrades
 OLAP tools require a different platform configuration than
the DW

104
OLTP vs. OLAP

OLTP OLAP
User Clerk, IT Knowledge worker
professional
Function Day to day operations Decision support

DB Design Application oriented Subject oriented

Data Current, up-to-date, Historical,


detailed, flat summarized,
relational, isolated multidimensional,
consolidated
Usage Repetitive Ad-hoc

Unit of work Short, simple


Complex queries
transactions
(Evaluation )Metric Transaction Query
throughput throughput,
response 105
Things to Consider

 Assess Needs Accurately! What users really need – not


what they say they want!

 Involve End-users! at every stage of the project!


 Requirements Will Be Fuzzy! Let them be…

 Storage & Processing Architectures Come Last! Choose


only after properly understanding business needs.

 Beware of Biased Advice! From OLAP vendors.

 Standardization is Not Good! Use limited number of OLAP


tools to optimally cover wide range of applications.
106
OLAP Issues

 Query Performance & Reliability


 Integration & Flexibility
 Capacity & Scalability

 Exponential Database Growth


 Total Cost of Ownership (TCO)
 Rapid Technological Changes

107

S-ar putea să vă placă și