Sunteți pe pagina 1din 37

Dimensional Modeling

What is a Data Model?


A Data model is a conceptual representation of data
structures(tables) required for a database and is very
powerful in expressing and communicating the business
requirements.
Data model helps functional and technical team in
designing the database.
Data Modeling Tools : Erwin, Oracle Designer, Power
Designer.
Two types of data modeling are as follows:
1) Logical modeling
2)Physical modeling


Logical Modeling
Includes entities (tables), attributes (columns/fields) and
relationships (keys).
Uses business names for entities & attributes
Is independent of technology (platform, DBMS)
Is normalized to fourth normal form(4NF)
Physical Modeling
Includes tables, columns, keys, data types, validation rules,
database triggers, stored procedures, domains, and
access constraints
Uses more defined and less generic specific names for
tables and columns, such as abbreviated column names,
limited by the database management system (DBMS) and
any company defined standards
Includes primary keys and indices for fast data access.
Logical Vs Physical

Logical v/s Physical
logical physical
Represents business information and
defines business rules
Represents the physical implementation
of the model in a database.
Entity. Table.
Attribute Column
Primary Key Primary Key Constraint
Alternate Key UserUnique Constraint or Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment
What is ER Modeling?
Entity Relational Data Modeling is used in OLTP systems
which are transaction oriented.
Focus of OLTP Design
Individual data elements
Data relationships
Design goals
Accurately model business
Remove redundancy(Normalized)

ER Modeling Shortcomings:
Complex
Unfamiliar to business people
Incomplete history
Slow query performance

Dimensional Modeling
Definition
Logical data model used to represent the measures and
dimensions that pertain to one or more business subject
areas
Dimensional Model = Star Schema
Can easily translate into multi-dimensional database
design if required
Overcomes ER design shortcomings

D M Advantages:
Understandable
Systematically represents history
Reliable join paths
High performance query
Enterprise scalability

ER v/s DM
ER DM
Tables are units of storage Cubes are units of storage
Data is normalized and used for
OLTP.
Data is denormalized and used in
datawarehouse and data mart.
Several tables and chains of
relationships among them
Few tables and fact tables are connected
to dimensional tables
Detailed level of transactional data Summary of bulky transactional
data(Aggregates and Measures) used in
business decisions
Normal Reports User friendly, interactive, drag and drop
multidimensional OLAP Reports
Dimension tables
Dimension table is one that describe the business entities
of an enterprise, represented as hierarchical, categorical
information such as time, departments , locations, and
products. Dimension tables are sometimes called lookup or
reference tables.
Textual content (Character data)
Dimension tables
Characteristics
Hold the dimensional attributes
Usually have a large number of attributes (wide)
Add flags and indicators that make it easy to perform
specific types of reports
Have small number of rows in comparison to fact tables
(most of the time)
Surrogate Key
A unique (primary key) generated by the RDBMS that is
not derived from any data in the database and whose only
significance is to act as the primary key. A surrogate key is
frequently a sequential number.
Each table assigned a unique primary key, specifically
generated for the data warehouse

Dimension table contd
Example of EMP dimension:
Dimension table contd
Example of dimension tables:



dealer_key

region
state
city
dealer
model_key

brand
category
line
model

Model
time_key

year
quarter
month
date
Time
Dealer
Slowly Changing Dimensions
Dimension source data may change over time
Relative to fact tables, dimension records change slowly
Allows dimensions to have multiple 'profiles' over time to
maintain history
Each profile is a separate record in a dimension table

Slowly Changing Dimension
Example
Example: A woman gets married
Possible changes to customer dimension
1) Last Name
2)Marriage Status
3)Address
4)Household Income
Existing facts need to remain associated with
her single profile
New facts need to be associated with her
married profile

Slowly Changing Dimension
Types
Three types of slowly changing dimensions
Type 1
Updates existing record with modifications
Does not maintain history
Type 2
Adds new record
Does maintain history
Maintains old record
Type 3:
Keep old and new values in the existing row
Requires a design change

Degenerated Dimensions
A degenerate dimension is a dimension which is derived
from the fact table and doesn't have its own dimension
table.
Stored in the fact table
Common examples include invoice numbers or order
numbers
Use - Degenerate dimensions is often based on the desire
to provide a direct reference back to a transactional system
without the overhead of maintaining a separate dimension
table.

Conformed Dimensions
A dimension that has exactly the same meaning and
content when being referred from different fact tables.
Example: Cube-1 contains F1 D1 D2 D3 and Cube-2
contains F2 D1 D2 D4 are the Facts and Dimensions
here D1 D2 are the Conformed Dimensions.
Eg: Time Dimension


Fact table
A fact table consists of the measurements, metrics or facts
of a business process.
Fact tables are often defined by their grain.
Grain
The level of detail represented by a row in the fact table
Must be identified early



Example of Fact table





Sales Facts
model_key
dealer_key
time_key

revenue
quantity






Facts
Fully additive
Can be summed across any and all dimensions
Stored in fact table
Examples: revenue, quantity , Sales_amount



Facts
Semi-additive
Semi-additive facts are facts that can be
summed up for some of the dimensions in the
fact table, but not the others.



Facts
Non-additive
Non-additive facts are facts that cannot be
summed up for any of the dimensions present in
the fact table.
All ratios are non-additive
Examples: Age, weather






Schemas in Data Warehouses


A schema is a collection of database
objects, including tables, views, indexes,
and synonyms.
There is a variety of ways of arranging
schema objects in the schema models
designed for data warehousing.
-STAR Schema
-Snowflake Schema






STAR Schema


The star schema (also called star-join schema or multi-
dimensional schema) is the simplest style of data
warehouse schema. The star schema consists of one or
more fact tables referencing any number of dimension
tables.
The main advantages of star schemas are that they:
- Provide highly optimized performance for typical star
queries.
- Widely supported by a large number of business
intelligence tools.







STAR Schema







Snowflake Schema


The snowflake schema is similar to the star schema.
However, in the snowflake schema, dimensions are
normalized into multiple related tables, whereas the star
schema's dimensions are denormalized with each
dimension represented by a single table.
Advantages of Using the Snowflake Schema :
- easier to maintain.
- increases flexibility
Disadvantages of Using the Snowflake Schema
- increases the number of tables an end-user must work
with.
- makes the queries much more difficult to create because
more tables need to be joined.







Snowflake Schema






32
Designing a Star Schema
Five initial design steps
Based on Kimball's six steps
Start designing in order
Re-visit and adjust over project life
33
1.
Identify fact table

Start by naming the fact table with the name
of the business subject area
Step One
34
Step Two
2.
Identify fact table grain

Describe what a row in the fact table
represents - in business terms
35
Step Three
3.
Identify dimensions
36
Step Four
4.
Select facts
37
Step Five
5.
Identify dimensional
attributes

S-ar putea să vă placă și