Sunteți pe pagina 1din 29

1/17/2013

UPRAVLJANJE
POSLOVNIM PODACIMA

DIMENZIJSKO MODELIRANJE
PROF. DRAENA GAPAR
15.01.2013.

INFORMACIJE
Novi plan nastave ubrzanje
Softver za formiranje kocke

http://www.bi-lite.com/product/DownloadCUBEitZERO.aspx

CUBE-it Zero Foundation - free

1/17/2013

NAPREDNI KONCEPTI
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Degenerativna dimenzija
Pahuljasta shema (Snowflaking)
Previe dimenzija
Surogatni kljuevi
Periodini snapshot
Poluzbrojive vrijednosti (fakti)
Data warehouse bus matrix
Podudarne dimenzije/fakti
Sporo promjenjive dimenzije
Dimenzije s viestrukim vrijednostima

DEGENERATE

DIMENSION

Dimension table without atributes.


A degenerate dimension is data that is
dimensional in nature but stored in a
fact table.
Example: a dimension that only has Order
Number and Order Line Number
1:1 relationship with the Fact table
CONSEQUENCES ????

1/17/2013

DEGENERATE

DIMENSION

Consequence:
Two tables with a billion rows
Instead of one table with a billion rows.
It would be a degenerate dimension and
Order Number and Order Line Number
would be stored in the Fact table.

DEGENERATE

DIMENSION

Degenerate dimensions commonly occur


when the fact table's grain is a single
transaction (or transaction line).
Transaction control header numbers
assigned by the operational business
process are typically degenerate
dimensions, such as order, ticket, credit
card transaction, or check numbers.
These degenerate dimensions are natural
keys of the "parents" of the line items.

1/17/2013

DEGENERATE DIMENSIONS

Example:

DIM CUSTOMER
Customer key
customer id
customer lname
customer fname

ORDERS TRANSACTIONS

order#
customer id
customer lname
customer fname
shipto street address
shipto city
shipto state

ORDERS FACTS
customer key
shipto address key
order date key
order total amount
discount amount
net order amount
payment amount
order#

DIM SHIPTO ADDRESS


Shipto address key
shipto street address
shipto city
shipto state
shipto zip

shipto zip
order total amount
discount amount
net order amount
payment amount

DIM Order Date


Order date key
Calendar date
Calendar month

order date

SNOWFLAKING
Normalized star schema

1/17/2013

SNOWFLAKING
Problems:
Increases

complexity for users


Decreases performance (numerous tables
and joins)
Slows down the users ability to browse
within a dimension (example of problem:
all brands within a category)

TOO MANY DIMENSIONS

1/17/2013

TOO MANY DIMENSIONS

A very large number of dimensions typically is a


sign that several dimensions are not completely
independent and should be combined into a
single dimension.
It is a dimensional modeling mistake to represent
elements of a hierarchy as separate dimensions
in the fact table.

SURROGATE

KEYS

Surrogate (artificial, nonnatural, synthetic) keys


are integers that are assigned sequentially as
needed to populate a dimension.
A surrogate key is a substitution for the natural
primary key.
It is meaningless.
It is just a unique identifier or number for each row
that can be used for the primary key to the table.
The only requirement for a surrogate primary key
is that it is unique for each row in the table.
The surrogate keys merely serve to join
dimensional tables to the fact table.
It is useful because the natural primary key (i.e.
Customer Number in Customer table) can change
and this makes updates more difficult.

1/17/2013

SURROGATE
Advantages

KEYS

of using surrogate keys

Performance
Efficient joins
smaller indexes
more rows per block

Data integrity

When the keys in operational systems are reused

Discontinued products, Deceased customers, etc.

Mapping when integrating data from different sources


Keys from different sources may be different
Mapping table of the surrogate key and keys from different
sources

SURROGATE

KEYS

Advantages of using surrogate keys (Cont)

Handling unknown or N/A values

Ease of assignment a surrogate key value to


rows with these values

Tracking changes in dimensional attribute values

Creating new attributes and assigning the


next available surrogate key

1/17/2013

SURROGATE

KEYS

Disadvantages of using surrogate keys

Assignment and management of surrogate keys and


appropriate substitution of these keys for natural
keys extra load for ETL system

Many ETL tools have built-in capabilities to


support surrogate key processing
Once the process is developed, it can be
easily reused for other dimensions

PERIODIC SNAPSHOT
At predetermined intervals snapshots of the
same level of details are taken and stacked
consecutively in the fact table
Example: most financial reports, bank account
value, inventory level
Complements detailed transaction facts but
not substitutes them
Share the same conformed dimensions but
have less dimensions

1/17/2013

TYPES OF FACTS
There are three types of facts:
Additive:

Additive facts are facts that


can be summed up through all of the
dimensions in the fact table.
Semi-Additive: Semi-additive facts are
facts that can be summed up for some of
the dimensions in the fact table, but not
the others.
Non-Additive: Non-additive facts are
facts that cannot be summed up for any of
the dimensions present in the fact table.

ADDITIVE FACTS
Date
Store
Product
Sales_Amount

The purpose of this table is to record the sales amount for each
product in each store on a daily basis.
Sales_Amount is the fact. In this case, Sales_Amount is an
additive fact, because you can sum up this fact along any of
the three dimensions present in the fact table -- date, store,
and product. For example, the sum of Sales_Amount for all 7
days in a week represent the total sales amount for that week.

1/17/2013

SEMIADDITIVE AND NONADITIVE

FACTS

Date
Account
Current_Balance
Profit_Margin

The purpose of this table is to record the current balance for each
account at the end of each day, as well as the profit margin for each
account for each day.
Current_Balance and Profit_Margin are the facts.
Current_Balance is a semi-additive fact, as it makes sense to add
them up for all accounts (what's the total current balance for all
accounts in the bank?), but it does not make sense to add them up
through time (adding up all current balances for a given account for
each day of the month does not give us any useful information).
Profit_Margin is a non-additive fact, for it does not make sense to
add them up for the account level or the day level.

CONFORMED

DIMENSIONS/FACTS

Master

or common reference dimensions


Shared across the DW environment
joining to multiple fact tables
representing various business processes
2

types
Identical dimensions
One dimension being a subset of a more
detailed dimension

10

1/17/2013

CONFORMED
Identical

DIMENSIONS/FACTS

dimensions

Same content, interpretation, and presentation


regardless of the business process involved
Same keys, attribute names, attribute definitions,
and domain values regardless of domain values
they join to
Example: product dimension referenced by orders
and the one referenced by inventory are identical

One

dimension being a perfect subset of a


more detailed, granular dimension table

Same attribute names, definitions, and domain


values
Example: sales is linked to a dimension table at
the individual product level; sales forecast is
linked at the brand level

CONFORMED DIMENSIONS
Sales Fact Table
Date key FK
Product key FK
other FKeys
Sales quantity
Sales amount

Sales Forecast Fact Table


Month key FK
Brand key FK
other FKeys
Forecast quantity
Forecast amount

Product Dimension
Product key PK
Product description
SKU number
Brand description
Sub class description
Class description
Department description
Color
size
Display type
Brand Dimension
Brand key PK
Brand description
Sub class description
Class description
Department description
Display type

11

1/17/2013

CONFORMED DIMENSIONS

Benefits

Consistency

Integration

Every fact table is filtered consistently and


results are labeled consistently
Users can create queries that drill across
fact tables representing different processes
individually and then join result set on
common dimension attributes

Reduced development time to market

Once created, conform dimensions are


reused

CONFORMED

FACTS

If

facts do live in more than one fact table,


the underlying definitions and equations
for these facts must be the same if they
are to be called the same thing.
If facts are labeled identically, then they
need to be defined in the same
dimensional contex and with the same
units of measure from data mart to data
mart.

Examples: revenue, profit, standard prices,


standard costs, measures of quality, measures of
customer satisfaction and other KPIs.

12

1/17/2013

CONFORMED

DIMENSIONS/FACTS

Master

or common reference dimensions


Shared across the DW environment
joining to multiple fact tables
representing various business processes
2

types
Identical dimensions
One dimension being a subset of a more
detailed dimension

SLOWLY CHANGING DIMENSIONS


Dimension

table attributes change infrequently


Mini-dimensions
Separating more frequently changing
attributes into their own separate dimension
table, mini-dimension
3 types of handling slowly changing dimensions
Overwrite the dimension attribute
Add a new dimension row
Add a new dimension attribute

13

1/17/2013

SLOWLY CHANGING DIMENSIONS OVERWRITE THE DIMENSION ATTRIBUTE


New

values overwrite old ones


No history is kept
Problems occur if data was previously
aggregated based on old values
Will not match ad-hoc aggregations based
on new values
Previous aggregations need to be updated
to keep aggregated data in-sync.

SLOWLY CHANGING DIMENSIONS - ADD


A NEW DIMENSION ROW

Most popular technique


New row with new surrogate PK is inserted into
dimension table to reflect new attribute values
Both, old and new values are stored along with
effective and expiration dates, and the current row
indicator
Example:

14

1/17/2013

SLOWLY CHANGING DIMENSIONS ADD A NEW DIMENSION ATTRIBUTE


Used

infrequently
A new column is added to the dimension table
Old value is recorded in a prior attribute
column
New value is recorded in the existing
column
All BI applications transparently use the
new attribute
Queries can be written to access values
stored in the prior attribute column

DATA WAREHOUSE BUS ARCHITECTURE


Cannot built the enterprise data warehouse in
one step.
Building isolated pieces will defeat consistency
goal.
Need an architected incremental approach
data warehouse bus architecture.

By defining a standard bus interface for the data


warehouse environment, separate data marts can
be implemented by different groups at different
times. The separate data marts can be plugged
together and usefully coexist if they adhere to the
standard.

MM
-07

15

1/17/2013

DATA WAREHOUSE BUS ARCHITECTURE


1/17/2013
31

Purchase Orders

Store Inventory

Store Sales

Date Product Store Prom.

WHouse Vender Shipper

MM
roo
m,
Exe
Pgp
2004
-07

DATA WAREHOUSE BUS ARCHITECTURE

32

architecture phase, team designs a


master suite of standardized dimensions
and facts that have uniform interpretation
across the enterprise.

1/17/2013

During

Separate

data marts are then developed


adhering to this architecture.
MM
roo
m,
Exe
Pgp
2004
-07

16

1/17/2013

ENTERPRISE BUS ARCHITECTURE

Requirements are gathered and represented in a form


of Enterprise Data Warehouse Bus Matrix

Each row corresponds to a business/process


Each column corresponds to a dimension of the business

Each column is a conformed dimension

Enterprise Data Warehouse Bus Matrix documents


the overall data architecture for DW/BI system

ENTERPRISE BUS ARCHITECTURE MATRIX

17

1/17/2013

ENTERPRISE BUS ARCHITECTURE MATRIX

Possible Problems:

Level of details for each column and row in the matrix


Row-related

Listing departments/imitating organizational


chart instead of business processes
Listing reports and analytics related to business
process instead of the business process itself

Ex. Shipping orders business process supports various


analytics such as customer ranking, sales rep
performance, product movement analyses

ENTERPRISE BUS ARCHITECTURE MATRIX

Possible Problems (Cont):

Column-related

Generalized columns/dimensions

Example: Entity column is too general as it includes


employees, suppliers, contractors, vendors, customers

Too many columns related to the same dimension

Worst case when each attribute is listed separately


Example: Product, Product Group, LOB are all related to
the Product dimension and should be listed as one.

18

1/17/2013

DIMENSIONAL MODELING MISTAKES TO AVOID

Place text attributes used for constraining and grouping in a


fact table
Limit verbose descriptive attributes in dimensions to save space
Split hierarchies and hierarchy levels into multiple dimensions
Ignore the need to track dimension attribute changes
Solve all query performance problems by adding more hardware
Use operational or smart keys to join dimension tables to a fact
table
Neglect to declare and then comply with the fact tables grain
Design the dimensional model based on a specific report
Expect users to query the lowest-level atomic dana in a
normalized format
Fail to conform facts and dimensions across separate fact tables

DW 2.0
Modeling process

19

1/17/2013

DW 2.0 - MODELING
The starting point for DW2.0 is the modeling
process.
2 basic models:

Process model
Data model

The process model


aplies to the data
mart environment
The data model
applies to the
integraterd sector,
the near line
sector and the
aechival sector.

20

1/17/2013

CORPORATE DATA MODEL


Corporate data model must have identified and structured
the following:
the major subjects of the enterprise,
the relationships between the subjects,
the creation of an ERD (entity relationship diagram),
for each major subject area:

the keys(s) of the subject,


the attributes of the subject,
the subtypes of the subject,
the connectors of one subject area to the next,
the grouping of attributes.

CORPORATE DATA MODEL

21

1/17/2013

CORPORATE DATA MODEL


The process analysis is interesting but usually is only an adjunct to the
corporate data model because the process analysis applies directly to
the operational environment, not the data warehouse environment. It
is the corporate data model that forms the backbone of design for the
data warehouse, not the process analysis.
The corporate data model is usually broken into multiple levels - a high
level and a mid level. The high level of the corporate data model
contains the major subject areas and how they relate.

CORPORATE DATA MODEL


Example of a high-level corporate data model

Four subject areas:


- Customer
- Account
- Order
- Product

Direct relationship between customer


and account, between account and
order, and between order and product.

22

1/17/2013

CORPORATE DATA MODEL


The next level of modeling in the corporate data model is
the mid level of modeling. The mid level of modeling is
the place where much of the detail of the model is found.
The mid level of modeling contains keys, attributes,
subtypes, groupings of attributes, and connectors.

CORPORATE DATA MODEL


There is a relationship between each subject area identified
in the high level model and the mid level models. For
each subject area identified, there is a single mid level
model.

23

1/17/2013

CORPORATE DATA MODEL


Transformation of corporate data model to DW model
through activities:
the removal of purely operational data,
the addition of an element of time to the key structure of
the data warehouse if one is not already present,
the addition of appropriate derived data,
the transformation of data relationships into data
artifacts,
accommodating the different levels of granularity found
in the data warehouse,
merging like data from different tables together,
creation of arrays of data, and
the separation of data attributes according to their
stability characteristics.

CORPORATE DATA MODEL


Removing operational data
- Estimation about reasonable chance that the
dana will be used for DSS

24

1/17/2013

CORPORATE DATA MODEL


Adding an element of time to the warehouse key

CORPORATE DATA MODEL


Adding derived data
As a rule data modelers do not include derived data as part of the
data modeling process. Consequently, corporate data models do
not contain derived data. The reason for the omission of derived
data is that when derived data is included in the data model, that
the data model will grow to ungainly proportions and the data
model will never be complete.
The next transformation that must be made to the corporate data
model is that of adding derived data to the data warehouse data
model where appropriate.
It is appropriate to add derived data to the data warehouse data model where the
derived data is popularly accessed and calculated once.
The addition of derived data makes sense because it reduces the amount of processing
required upon accessing the data in the warehouse. In addition, once properly
calculated, there never is any fear in the integrity of the calculation. Once the
derived data is properly calculated, there never is the chance that someone will
come along and use an incorrect algorithm for the calculation of the data, thus
enhancing the credibility of data in the data warehouse.

25

1/17/2013

CORPORATE DATA MODEL


Adding derived data

CORPORATE DATA MODEL


Changing granularity of dana

26

1/17/2013

CORPORATE DATA MODEL


Merging tables

Preconditions:
Tables share a common key
Data from different tables is
used together frequently
Pattern of insertion is
roughly the same

CORPORATE DATA MODEL


Organizing data according to its stability

27

1/17/2013

28

1/17/2013

Questions..

29

S-ar putea să vă placă și