Sunteți pe pagina 1din 41

Dimensional Modelling

Dimensional Modelling
Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. Dimensional modeling has two basic concepts: Facts Dimensions Other ralates concepts Aggregates Meta-data

Fact
Definition A fact is a collection of related data items, consisting of measures A fact is a focus of interest for the decision making process. Measures are continuously valued attributes that describe facts (Golfarelli et al) A fact is a business measure (Kimball and Ross)

What exactly is being analysed? what numbers are being analysed?

Examples of Facts
A university provides education services to its students. What are its facts and measures? Facts Measures Applications
Enrollment

number, revenue from prospectus sales


number, revenue

Student Performance
Student Placement

grades, marks, %age marks, division


designation, nature of job, salary

Student awards

Title, amount

Fact

Each fact typically represents a business item: an order a business transaction: order processing an event: arrival of an order

that can be used in analyzing the business or business process

Some Aspects of Facts


A fact is continuously valued. It takes a value from a a broad range of values. The set of integers real numbers The most useful facts are numeric and additive: we almost never work with a single fact
Textual facts occur very rarely: free format and unpredictable contents make it impossible to analyse these recent interest in unstructured DW look at these

Types of Facts
Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. E.g. Sales_Amount along date, product Semi-additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. E.g. current_balance along account not along date Non-additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact e.g. percentage or profit margin

Dimension
Definition

The parameter over which we want to perform analysis of facts sales is a fact; perform analysis over region, product, time
The parameter that gives meaning to a measure number of customers is a fact, perform analysis over time

Discretely valued description that is more or less constant and participates in constraints
Qualifying characteristics that provide additional perspective to a given fact

Examples of Dimension
A university provides education services to its students. What are its facts and dimensions? Facts Dimension

Applications
Enrollment

Age, Region
Region

Performance
Placement

Year, Discipline, Student


Year, Discipline, Grades

Student awards

Discipline, Year

Dimensions and their Values


Dimension
Age

Dimension Value
10, 11, 12 ..

Region
Year

North, South
1999, 2000 .

Discipline
Grades

ECE, CSE, IT,...


A+, A,.

Student

Name of student

Aspects of Dimensions

The values of dimensions do not change with time slow changing dimensions rapidly changing dimensions Need to handle such changes

Dimensions are the primary source of query constraints, report headings, and groupings

Dimension Hierarchies/Categories
Dimensions are composed of smaller units called categories or members simpler components forming a hierarchy country, zone, branch, unit Hierarchies are a basis for drill down and roll-up special, notable units holidays For special queries: sales performance on holidays

Organising Facts and Dimensions


The model should provide drill down/roll up along dimension hierarchies provide good data access must be query centric be optimised for queries and analysis each dimension should be able to interact fully with the fact

The Star Schema


Dimension Dimension

Fact

Dimension

Dimension

Dimension

A DW is a collection of star schemata

Example: Facts and Dimensions


Product type Region City Sales Product name

Rupees

Year Season Month

Computing Fact Sizes

Product type
Region City Sales Rupees Let there be 5000 products 60 months 50 cities Number of sales facts = 5000*60*50= 15000000 Product name

Assume one sale fact per product, per city, per month

Year

Season
Month

Sparse Facts
Not all 5000 products may be sold each month in each city
Assume that 3000 products are sold each month in each city Number of sales facts = 3000 * 60 * 50 = 9000000 Approximately 60% of the cube is occupied and 40% is empty

Aggregation
Aggregates are pre-calculated summaries along dimension hierarchies derived from basic facts. We need the total sales for each region, product wise and month-wise Aggregation is Number of products = 5000 performed in order to Number of regions = 5 speed up common Number of months = 60 queries Total number of facts = 5000*5*60 = 1500000 Space-time tradeoff if the frequency of use is high then pay the storage expense Aggregation guideline if the number of facts summarised is more than 10, then do aggregation

Aggregation
Year Region Product type

Season

Three-way aggregati Month City No aggregation

product name

One-way aggregation
Two-way aggregation When aggregation is done by rising along n-dimensions then n-way aggregation is said to be performed

Sparsity and Aggregation


As the amount of aggregation increases sparsity decreases One-way aggregation on regions results in 1.5M facts The probability of all 5000 products being sold in a month in a region is higher than of all 5000 being sold in a city Two-way aggregation on regions and season results in 0.5M facts The probability of all 5000 products being sold in a month in a region is higher than of all 5000 being sold in a region

Aggregation and the Star Schema


Each aggregate is a fact with its own derived dimensions Derived dimensions may be defined on the fly Sales summary by quarter, but quarter was not in the original dimension hierarchy Each aggregate has its own star schema

Metadata

Different definitions :
Data about the data Tables of contents for the data

Catalog for the data


Data warehouse atlas

Data warehouse roadmap


Metadata contains the answers to questions about the data in the Data Warehouse

Central Role of Metadata

Metadata for End Users

Example

Entity Name Aliases Definition Source Systems Create Date Last Update Date Update Cycle Full refresh cycle Data Quality Review Planned Archival

Customer Account, Client Anyone who purchases hotel rooms Reservations, Accounts, Housekeeping 1 January 2000 13 September 2003 weekly six months 15 September 2003 Every six months

Metadata for IT Professionals

Metadata Driven Data Warehouse Process

Data Acquisition Metadata Types

Information Delivery
Functions:
Report generation Query processing Complex analysis

Metadata recorded in the information delivery functional area


relate to predefined queries, predefined reports, and input parameter definitions for queries and reports also include information for OLAP.

Information Delivery Metadata Types

Challenges for Metadata Management


Reconcile the formats of metadata of several tools No industry-wide accepted standards Centralized metadata repository : a collection of fragmented metadata stores No easy and accepted methods of passing metadata Preserving version control of metadata Unifying the metadata relating to the data sources can be an enormous task

Common Warehouse Model

Foundation Metadata Business information about model elements Data types Keys and Indexes Expression Software Deployment: software deployed in DW Type Mapping: mapping of data types between different systems

Common Warehouse Model


Metadata for Resource Relational data sources Record data sources multidimensional resources XML data sources
Analysis Metadata Data transformation tools OLAP processing tools Data mining tools Information visualisation tools Business taxonomy and glossary

Metadata

Common Warehouse Model


Management

Warehouse Processes Results of Warehouse Operations

The Star Schema Revisited

The Star contains detailed facts and dimensions Aggregates are facts and have their own dimensions Meta-data support is built around the start schema

Star Schema: Benefits

Depicts a fuller description of each dimension


Explicitly shows multiple levels of aggregation on each dimension Depicts multiple facts at the intersection of all dimensions

Directly implementable in a Relational DBMS


Can utilize new, accelerated approaches to indexing, STARindex and joining, STARjoin

Dimensional Modelling vs. Spread Sheet


Annual product sales by region ($,000)
=======================================================================

REGION: PRODUCT:
Stibes Farkles Teglers SOUTHERN WESTERN NORTHERN EASTERN TOTAL ---------------------------------------------------------------------------------------------------------------------------$7,140 5,460 3,150 $14,790 11,310 6,525 $13,260 10,140 5,850 $15,810 12,090 6,975 $51,000 39,000 22,500

Qwerts 5,250 11,875 10,750 12,625 40,500 ---------------------------------------------------------------------------------------------------------------------------TOTALS: $21,000 $44,500 $40,000 $47,500 $153,000 =======================================================================

Is this a Relational Table? What is the Entity? What is the Identifier? What are the Attributes? How to make it a Relational Table?

How many Fact types? How many Dimensions?

Dimensional Modelling vs. Relations


How many Facts? How many Dimensions? What type of Table? What is the Identifier?

REGION
Southern Southern Southern Southern Western Western Western Western Northern Northern Northern Northern Eastern Eastern Eastern Eastern (all) (all) Southern (all)

PRODUCT
Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Qwerts (all) (all)

SALES
$7,140 5,460 3,150 5,250 14,790 11,310 6,525 11,875 13,260 10,140 5,850 10,750 15,810 12,090 6,975 12,625 51,000 40,500 21,000 153,000

Where are the Dimension Tables?


REGION:
NAME (all) Southern Western Northern Eastern LEVEL 1 2 2 2 2

---------------------

ER Diagram
REGION
Region ID Region Name

STORE
Store ID Store Name Address City State ZipCode Region ID (fk)

SALES
Sales Date Store ID (fk) Product ID (fk) Sale Amount Sale Units

DEPARTMENT
Department ID Department Name

INVENTORY PRODUCT GROUP


Product Group ID Product Group Desc. Department ID (fk)

PRODUCT
Product ID Product Desc. Product Group ID (fk)

Week Store ID (fk) Product ID (fk) Quantity

ER Diagram
Good for OLTP

Update in exactly one place


No redundancy

Oriented towards insertion, deletion. Modification of data


weak entities/relationships create normalised structures What are the facts and dimensions?

Transformation of ER to Star

Product Dimension

Location Dimension

Time Dimension

DEPARTMENT

REGION

YEAR
MONTH

PRODUCT GROUP STORE

WEEK

PRODUCT ITEM
DATE SALES FACTS

S-ar putea să vă placă și