Sunteți pe pagina 1din 53

Data and Information

Management
Week 10 - OLAP
Online Analytical
Processing (OLAP)
• Original definition - The dynamic synthesis,
analysis, and consolidation of large volumes
of multi-dimensional data, Codd (1993).

• Describes a technology that is designed to


optimize the storing and querying of large
volumes of multi-dimensional data that is
aggregated (summarized) to various levels of
detail to support the analysis of this data.
2
Online Analytical
Processing (OLAP)
• Enables users to gain a deeper understanding
and knowledge about various aspects of their
corporate data through fast, consistent,
interactive access to a wide variety of possible
views of the data.

• Allows users to view corporate data in such a


way that it is a better model of the true
dimensionality of the enterprise.

3
Online Analytical
Processing (OLAP)
• Can easily answer ‘who?’ and ‘what?’
questions, however, ability to answer ‘why?’
type questions distinguishes OLAP from
general-purpose query tools.

• Types of analysis ranges from basic


navigation and browsing (slicing and dicing)
to calculations, to more complex analyses
such as time series and complex modeling.
4
OLAP Benchmarks
• OLAP Council published an analytical
processing benchmark referred to as the
APB-1 (OLAP Council, 1998).

• Aim is to measure a server’s overall OLAP


performance rather than the performance of
individual tasks.

5
OLAP Benchmarks
• APB-1 assesses the most common business
operations including:
– bulk loading of data from internal or external
data sources
– incremental loading of data from operational
systems;
– aggregation of input level data along hierarchies;

6
OLAP Benchmarks
• APB-1 assesses the most common business
operations including (continued):
– calculation of new data based on business
models;
– time series analysis;
– queries with a high degree of complexity;
– drill-down through hierarchies;
– ad hoc queries;
– multiple online sessions.
7
OLAP Benchmarks
• OLAP applications are judged on their ability to
provide just-in-time (JIT) information, a core
requirement of supporting effective decision-
making.

• This requirement is more than measuring


processing performance but includes its abilities
to model complex business relationships and to
respond to changing business requirements.

8
OLAP Benchmarks
• APB-1 uses a standard benchmark metric
called AQM (Analytical Queries per Minute).

• AQM represents the number of analytical


queries processed per minute including data
loading and computation time. Thus, the
AQM incorporates data loading
performance, calculation performance, and
query performance into a singe metric.
9
OLAP Benchmarks
• Publication of APB-1 benchmark results must
include both the database schema and all code
required for executing the benchmark.

• An essential requirement of all OLAP


applications is the ability to provide users with
JIT information, which is necessary to make
effective decisions about an organization's
strategic directions.

10
OLAP Applications
• JIT information is computed data that usually
reflects complex relationships and is often
calculated on the fly. Also as data
relationships may not be known in advance,
the data model must be flexible.

11
Examples of OLAP
applications in various
functional areas

12
OLAP Applications
• Although OLAP applications are found in
widely divergent functional areas, they all
have the following key features:
– multi-dimensional views of data
– support for complex calculations
– time intelligence

13
OLAP Applications -
multi-dimensional views
of data
• Core requirement of building a ‘realistic’
business model.

• Provides basis for analytical processing through


flexible access to corporate data.

• The underlying database design that provides


the multi-dimensional view of data should treat
all dimensions equally.
14
OLAP Applications -
support for complex
calculations
• Must provide a range of powerful
computational methods such as that
required by sales forecasting, which uses
trend algorithms such as moving averages
and percentage growth.

• Mechanisms for implementing


computational methods should be clear and
non-procedural.
15
OLAP Applications – time
intelligence
• Key feature of almost any analytical application
as performance is almost always judged over
time.

• Time hierarchy is not always used in the same


manner as other hierarchies.

• Concepts such as year-to-date and period-over-


period comparisons should be easily defined.
16
Multi-dimensional Data
and OLAP cubes
• Multi-dimensional data is facts (numeric
measurements) such as property sales revenue
data and the association of this data with
dimensions such as location (of the property)
and time (of the property sale).

• Which is the best representation of multi-


dimensional data: relational table, matrix or
data cube?

17
Multi-dimensional Data
as 3-field Table versus 2-
D Matrix

18
Multi-dimensional Data
as 4-field Table versus 3-
D Cube

19
Multi-dimensional Data
as series of 3-D Cubes

20
Multi-dimensional data
and OLAP cubes
• We consider cubes as solid 3-D structures with
equal sides. However, the OLAP cube is n-
dimensional structure (with sides that need not
be equal).

• Alternative representation for n-dimensional


data is to consider a data cube as a lattice of
cuboids. Each cuboid represents a subset of the
given dimensions.

21
Multi-dimensional data
and OLAP cubes

22
Dimensional Hierarchy
• The lattice of cuboids does not show the
hierarchies that are commonly associated
with dimensions.

• A dimensional hierarchy defines mappings


from a set of lower-level concepts to higher
level concepts.

23
Dimensional Hierarchy

24
OLAP Operations
• The analytical operations that can be
performed on data cubes include:
– Roll-up
– Drill-down
– Slice and Dice
– Pivot

25
OLAP Operations
• Roll-up performs aggregations on the data by
moving up the dimensional hierarchy or by
dimensional reduction e.g. 4-D sales data to 3-D
sales data.
• Drill-down is the reverse of roll-up and involves
revealing the detailed data that forms the
aggregated data. Drill-down can be performed
by moving down the dimensional hierarchy or
by dimensional introduction e.g. 3-D sales data
to 4-D sales data.

26
OLAP Operations
• Slice and dice - ability to look at data from
different viewpoints. The slice operation
performs a selection on one dimension of
the data whereas dice uses two or more
dimensions. For example a slice of sales
revenue (type = ‘Flat’) and a dice (type =
‘Flat’ and time = ‘Q1’).

27
OLAP Operations
• Pivot - ability to rotate the data to provide
an alternative view of the same data e.g.
sales revenue data displayed using the
location (city) as x-axis against time (quarter)
as the y-axis can be rotated so that time
(quarter) is the x-axis against location (city) is
the y-axis.

28
OLAP Tools
• There are many varieties of OLAP tools
available in the marketplace.

• This choice has resulted in some confusion


with much debate regarding what OLAP
actually means to a potential buyer and in
particular what are the available
architectures for OLAP tools.
29
Codd’s Rules for OLAP
Systems
• In 1993, E.F. Codd formulated twelve rules as
the basis for selecting OLAP tools.

30
Codd’s Rules for OLAP
Systems
• Multi-dimensional conceptual view
• Transparency
• Accessibility
• Consistent reporting performance
• Client-server architecture
• Generic dimensionality

31
Codd’s rules for OLAP
• Dynamic sparse matrix handling
• Multi-user support
• Unrestricted cross-dimensional operations
• Intuitive data manipulation
• Flexible reporting
• Unlimited dimensions and aggregation levels

32
Codd’s Rules for OLAP
Systems
• There are proposals to re-defined or
extended the rules. For example to also
include
– Comprehensive database management tools
– Ability to drill down to detail (source record)
level
– Incremental database refresh
– SQL interface to the existing enterprise
environment

33
Categories of OLAP Tools
• OLAP tools are categorized according to the
architecture used to store and process multi-
dimensional data.

• There are three main categories:


– Multi-dimensional OLAP (MOLAP)
– Relational OLAP (ROLAP)
– Hybrid OLAP (HOLAP)
34
Multi-dimensional OLAP
(MOLAP)
• Use specialized data structures and multi-
dimensional Database Management Systems
(MDDBMSs) to organize, navigate, and
analyze data.

• Data is typically aggregated and stored


according to predicted usage to enhance
query performance.
35
Multi-dimensional OLAP
(MOLAP)
• Use array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data
management.

• Provides excellent performance when data is


used as designed, and the focus is on data
for a specific decision-support application.
36
Multi-dimensional OLAP
(MOLAP)
• Traditionally, require a tight coupling with
the application layer and presentation layer.

• Recent trends segregate the OLAP from the


data structures through the use of published
application programming interfaces (APIs).

37
Typical Architecture for
MOLAP Tools

38
MOLAP Tools -
Development Issues
• Underlying data structures are limited in
their ability to support multiple subject areas
and to provide access to detailed data.

• Navigation and analysis of data is limited


because the data is designed according to
previously determined requirements.

39
MOLAP Tools -
Development Issues
• MOLAP products require a different set of
skills and tools to build and maintain the
database, thus increasing the cost and
complexity of support.

40
Relational OLAP (ROLAP)
• Fastest-growing style of OLAP technology
due to requirements to analyze ever-
increasing amounts of data and the
realization that users cannot store all the
data they require in MOLAP databases.

41
Relational OLAP (ROLAP)
• Supports RDBMS products using a metadata
layer - avoids need to create a static multi-
dimensional data structure - facilitates the
creation of multiple multi-dimensional views
of the two-dimensional relation.

42
Relational OLAP (ROLAP)
• To improve performance, some products use
SQL engines to support the complexity of
multi-dimensional analysis, while others
recommend, or require, the use of highly
denormalized database designs such as the
star schema.

43
Typical Architecture for
ROLAP Tools

44
ROLAP Tools -
Development Issues
• Performance problems associated with the
processing of complex queries that require
multiple passes through the relational data.

• Middleware to facilitate the development of


multi-dimensional applications. (Software
that converts the two-dimensional relation
into a multi-dimensional structure).
45
ROLAP Tools -
Development Issues
• Development of an option to create
persistent, multi-dimensional structures with
facilities to assist in the administration of
these structures.

46
Hybrid OLAP (HOLAP)
• Provide limited analysis capability, either
directly against RDBMS products, or by using
an intermediate MOLAP server.

• Deliver selected data directly from the DBMS


or via a MOLAP server to the desktop (or
local server) in the form of a datacube,
where it is stored, analyzed, and maintained
locally.
47
Hybrid OLAP (HOLAP)
• Promoted as being relatively simple to install
and administer with reduced cost and
maintenance.

48
Typical Architecture for
HOLAP Tools

49
HOLAP Tools -
Development Issues
• Architecture results in significant data
redundancy and may cause problems for
networks that support many users.

• Ability of each user to build a custom datacube


may cause a lack of data consistency among
users.

• Only a limited amount of data can be efficiently


maintained.
50
Desktop OLAP (DOLAP)
• Store the OLAP data in client-based files and
support multi-dimensional processing using
a client multi-dimensional engine.

• Requires that relatively small extracts of data


are held on client machines. They may be
distributed in advance, or created on
demand (possibly through the Web).
51
Thank You

S-ar putea să vă placă și