Documente Academic
Documente Profesional
Documente Cultură
Dimensional Modelling
Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. Dimensional modeling has two basic concepts: Facts Dimensions Other ralates concepts Aggregates Meta-data
Fact
Definition A fact is a collection of related data items, consisting of measures A fact is a focus of interest for the decision making process. Measures are continuously valued attributes that describe facts (Golfarelli et al) A fact is a business measure (Kimball and Ross)
Examples of Facts
A university provides education services to its students. What are its facts and measures? Facts Measures Applications
Enrollment
Student Performance
Student Placement
Student awards
Title, amount
Fact
Each fact typically represents a business item: an order a business transaction: order processing an event: arrival of an order
Types of Facts
Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. E.g. Sales_Amount along date, product Semi-additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. E.g. current_balance along account not along date Non-additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact e.g. percentage or profit margin
Dimension
Definition
The parameter over which we want to perform analysis of facts sales is a fact; perform analysis over region, product, time
The parameter that gives meaning to a measure number of customers is a fact, perform analysis over time
Discretely valued description that is more or less constant and participates in constraints
Qualifying characteristics that provide additional perspective to a given fact
Examples of Dimension
A university provides education services to its students. What are its facts and dimensions? Facts Dimension
Applications
Enrollment
Age, Region
Region
Performance
Placement
Student awards
Discipline, Year
Dimension Value
10, 11, 12 ..
Region
Year
North, South
1999, 2000 .
Discipline
Grades
Student
Name of student
Aspects of Dimensions
The values of dimensions do not change with time slow changing dimensions rapidly changing dimensions Need to handle such changes
Dimensions are the primary source of query constraints, report headings, and groupings
Dimension Hierarchies/Categories
Dimensions are composed of smaller units called categories or members simpler components forming a hierarchy country, zone, branch, unit Hierarchies are a basis for drill down and roll-up special, notable units holidays For special queries: sales performance on holidays
Fact
Dimension
Dimension
Dimension
Rupees
Product type
Region City Sales Rupees Let there be 5000 products 60 months 50 cities Number of sales facts = 5000*60*50= 15000000 Product name
Assume one sale fact per product, per city, per month
Year
Season
Month
Sparse Facts
Not all 5000 products may be sold each month in each city
Assume that 3000 products are sold each month in each city Number of sales facts = 3000 * 60 * 50 = 9000000 Approximately 60% of the cube is occupied and 40% is empty
Aggregation
Aggregates are pre-calculated summaries along dimension hierarchies derived from basic facts. We need the total sales for each region, product wise and month-wise Aggregation is Number of products = 5000 performed in order to Number of regions = 5 speed up common Number of months = 60 queries Total number of facts = 5000*5*60 = 1500000 Space-time tradeoff if the frequency of use is high then pay the storage expense Aggregation guideline if the number of facts summarised is more than 10, then do aggregation
Aggregation
Year Region Product type
Season
product name
One-way aggregation
Two-way aggregation When aggregation is done by rising along n-dimensions then n-way aggregation is said to be performed
Metadata
Different definitions :
Data about the data Tables of contents for the data
Example
Entity Name Aliases Definition Source Systems Create Date Last Update Date Update Cycle Full refresh cycle Data Quality Review Planned Archival
Customer Account, Client Anyone who purchases hotel rooms Reservations, Accounts, Housekeeping 1 January 2000 13 September 2003 weekly six months 15 September 2003 Every six months
Information Delivery
Functions:
Report generation Query processing Complex analysis
Foundation Metadata Business information about model elements Data types Keys and Indexes Expression Software Deployment: software deployed in DW Type Mapping: mapping of data types between different systems
Metadata
The Star contains detailed facts and dimensions Aggregates are facts and have their own dimensions Meta-data support is built around the start schema
REGION: PRODUCT:
Stibes Farkles Teglers SOUTHERN WESTERN NORTHERN EASTERN TOTAL ---------------------------------------------------------------------------------------------------------------------------$7,140 5,460 3,150 $14,790 11,310 6,525 $13,260 10,140 5,850 $15,810 12,090 6,975 $51,000 39,000 22,500
Qwerts 5,250 11,875 10,750 12,625 40,500 ---------------------------------------------------------------------------------------------------------------------------TOTALS: $21,000 $44,500 $40,000 $47,500 $153,000 =======================================================================
Is this a Relational Table? What is the Entity? What is the Identifier? What are the Attributes? How to make it a Relational Table?
REGION
Southern Southern Southern Southern Western Western Western Western Northern Northern Northern Northern Eastern Eastern Eastern Eastern (all) (all) Southern (all)
PRODUCT
Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Farkles Teglers Qwerts Stibes Qwerts (all) (all)
SALES
$7,140 5,460 3,150 5,250 14,790 11,310 6,525 11,875 13,260 10,140 5,850 10,750 15,810 12,090 6,975 12,625 51,000 40,500 21,000 153,000
---------------------
ER Diagram
REGION
Region ID Region Name
STORE
Store ID Store Name Address City State ZipCode Region ID (fk)
SALES
Sales Date Store ID (fk) Product ID (fk) Sale Amount Sale Units
DEPARTMENT
Department ID Department Name
PRODUCT
Product ID Product Desc. Product Group ID (fk)
ER Diagram
Good for OLTP
Transformation of ER to Star
Product Dimension
Location Dimension
Time Dimension
DEPARTMENT
REGION
YEAR
MONTH
WEEK
PRODUCT ITEM
DATE SALES FACTS