Sunteți pe pagina 1din 31

Principles Of Dimensional

Modeling
Design Requirements

n Design of the DW must directly reflect the way the


managers look at the business

nShould capture the measurements of importance along


with parameters by which these parameters are viewed

nIt must facilitate data analysis, i.e., answering


business questions
What is Dimensional Modeling
(DM)?
n DM is a logical design technique that seeks to present the
data in a standard, intuitive framework that allows for
high-performance access.
n Can be implemented using a relational or a multidimensional
DBMS, with some restrictions.
n It is different from ER modeling
n Every dimensional model is composed of one table with a
multipart key, called the fact table, and a set of smaller
tables called dimension tables.
n Each dimension table has a single-part primary key that
corresponds exactly to one of the components of the
multipart key in the fact table.
n This characteristic "star-like" structure is often called a star
schema.

3
ER Modeling

n A logical design technique that seeks to eliminate data


redundancy
n Illuminates the microscopic relationships among data
elements
n Perfect for OLTP systems
n Responsible for success of transaction processing in
Relational Databases
Problems with ER Model
ER models are NOT suitable for DW?

n End user cannot understand or remember an ER


Model
n Many DWs have failed because of overly complex ER
designs
n Not optimized for complex, ad-hoc queries

n Data retrieval becomes difficult due to normalization

n Browsing becomes difficult


ER vs Dimensional Modeling
n ER models are constituted to
n Remove redundant data (normalization)

n Facilitate retrieval of individual records having

certain critical identifiers


n Thereby optimizing OLTP performance

n Dimensional model supports the reporting and


analytical needs of a data warehouse system.

Comparison between the ER & Dimensional
Model

Dimensional Modeling ER Model


Support adhoc querying for Support for OLTP
business
The data analyses
model is and complex The data model has two
analyses
multidimensional
It is asymmetric dimensions
It is symmetric
Permit redundancy Removes redundancy
It is extensible, application is not If the model is modified
changed
It can be done independent of ,applications arestructure
It is variable in modifiedand
expected
Easy and query patterns
understandable very
Hardvulnerable
for people to changes
visualizein the
Models the business practically user’s
Modelsquerying habits
the micro relationships
among data elements

7
Dimension Modeling Concepts

n Design goals :user understanability,Query


performance,resilience to change
n

n Components of DM:
Ø Fact Tables
Ø Dimension Tables

8
Inside Dimension table

n Dimensional table key


n Large no. of attributes
n Textual attributes
n Attributes not directly related
n Flattened table,not normalized
n Ability to drill down/roll up
n Multiple hierarchies
n Less number of records
n

9
Inside Fact Table

n Concatenated fact table key


n Grain/level of data identified
n Fully-additive-all dimensions
n Semi-additive-some dimensions
n Large no. of records
n Few attributes
n Sparsity of data
n Degenerate dimensions
n

10
Factless Fact Table

n Some fact tables have no measured facts


n Useful to describe events and coverage
,tables contain information that
something has/has not happened
n Often used to represent many-to-many
relationships
n The only thing they contain is
concatenated key

11
Star Schema keys

n Primary keys
n Surrogate keys
n Foreign keys

12
Modeling Design Process
1.Identify the Business Process
n Source of “measurements”
2.Identify the Grain
n What does 1 row in the fact table represent
or mean?
3.Identify the Dimensions
n Descriptive context, true to the grain
4.Identify the Facts
n Numeric additive measurements, true to
the grain
Step 1 - Identify the Business
Process

n This is a business activity typically tied to a


source system.
n Not to be confused with a business
department or function. An Orders
dimensional model should support the
activities of both Sales and Marketing.
n “If we establish departmentally bound
dimensional models, we’ll inevitably
duplicate data with different labels and
terminology.”
Step 2 - Identify the Grain

n The level of detail associated with the fact


table measurements.
n A critical step necessary before steps 3 and
4.
n Preferably it should be at the most atomic
level possible.
n “How do you describe a single row in the
fact table?”
n
Step 3 - Identify the
Dimensions

n The list of all the discrete, text-like


attributes that emanate from the fact
table.
n They are the “by” words used to describe
the requirements.
n Each dimension could be though of as an
analytical “entry point” to the facts.
n “How do business people describe the data
that results from the business process?”
Step 4 - Identify the Facts

n Must be true to the grain defined in step 2.


n Typical facts are numeric additive figures.
n Facts that belong to a different grain belong
in a separate fact table.
n Facts are determined by answering the
question, “What are we measuring?”
n Percentages and ratios, such as gross
margin, are non-additive. The numerator
and denominator should be stored in the
fact table.
Advantages of star schema

n Easy for users to understand


n Optimizes navigation
n Most suitable for query processing

18
DM:Advanced Topics

n Slowly Changing dimensions


Type 1 changes: Correction of errorsIs used

when the old value of the attribute has no


significance or can be discarded.
n Easy and Fast
 Type 2 changes: preservation history
 Partitions history so that fact tables properly
reflect original values.
n Requires use of Surrogate Keys
n Causes table growth due to additional history
rows
n Users must be aware of the added complexity
n Effective Dates used secondary to cleaner
fact joins 19
Ø Type 3 changes: tentative soft revisions
n Additional attribute used to capture
changes.
n Used less frequently then Type 1 or 2.
n Relate to tentative changes in the source
systems.
n Used to compare performances.
n Ability to track forward and backward

20
n Large Dimensions
n Rapidly changing dimensions

21
Snowflake Schema

Snowflaking is a method of normalizing the dimension tables


in STAR schema
Advantages:

n Small savings in storage space

n Normalized structures are easier to update and maintain

Disadvantage:

n Schema less intuitive and end users are put off by

complexity
n Ability to browse through the contents difficult

n Degraded query performance because of additional joins

22
Star Schema

23
Flattened Star

24
Normalized Star

CSE 5331/7331 F'07 25


Snowflake Schema

CSE 5331/7331 F'07 26


Snowflake Schema Star Schema
Joins: Higher number of Joins Fewer Joins

Ease of Use: More complex queries and Less no. of foreign keys and
hence less easy to hence lesser query execution
understand time

Query Performance: More foreign keys-and hence Less no. of foreign keys and
more query execution time hence lesser query execution
time
Ease of maintenance/change: No redundancy and hence Has redundant data and
more easy to maintain and hence less easy to
change maintain/change

Type of Data warehouse: Good to use for small data Good for large data warehouses
warehouses/datamarts

Dimension table: It may have more than one Contains only single
dimension table for each dimension table for each
dimension dimension

DimTable Normalization: 3 Normal Form 2 Normal Denormalized Form

27
Fact Constellation schema
n It is shaped like constellation of stars
n For each star schema or snowflake schema it is possible to
construct a fact constellation schema
n This schema is more complex than star or snowflake architecture,
which is because it contains multiple fact tables
n allows dimension tables to be shared amongst many fact tables.
n solution is very flexible, however it may be hard to manage and
support.
n The main disadvantage of the fact constellation schema is a more
complicated design because many variants of aggregation
must be considered
n Different fact tables are explicitly assigned to the dimensions,
which are for given facts relevant. This may be useful in cases
when some facts are associated with a given dimension level
and other facts with a deeper dimension level.

28
Dimensional Model Star Schema

29
Snow-Flake Schema in Dimensional Modeling

30
Fact Constellation Schema

31

S-ar putea să vă placă și