Sunteți pe pagina 1din 58

Tutorial # 1

• London Metropolitan University is composed of various departments with each


department offering a variety of courses. Each course consists of a set of modules,
which are taken in either of the semesters. The university library provides services to
students that enable books, journals, videos, etc. to be borrowed upon validation of
their registration details (i.e. Student ID). This can either be done online through the
university's website or directly from the library counter. The library ensures a vast and
comprehensive array of study materials (books, journals, videos, etc.) is provided for
each course module. Students are then allowed to keep the book for a specified period
after which a fine is incurred if the deadline is exceeded. The borrowed materials can
however be renewed before the expiry date.
• However, library management needs a data warehouse application that can help
support various key decisions that will, in turn, improve services offered to students.

2
a) In line with a data warehouse being subject-oriented, using the
above scenario, identify what the "key focus" of this data warehouse
application should be.
b) Identify all the dimensions for the data warehouse application.
Considering your answer from question (a), identify at least 2 facts (or
measures) that will be contained in the fact table. (NOTE: Any
assumptions should be clearly stated.
c) Draw a simple star schema for the above data warehouse showing
only the primary key-foreign key relationships.

3
Dimensional Data Modelling
1. Date and Time dimensions
2. Degenerate Dimensions
3. Slowly Changing Dimensions (SCD)
4. Aggregate Fact tables
5. Three main types of fact tables.
6. Developing dimensional data models using iterative
process.
Date Dimensions
The date dimension is very important for every fact table as
facts are a sequence of observations.

The date dimension answers the first question asked to


identify dimensions of a fact table: When does it occur ?
It allows us to meet many user reporting and analysis
requirements such as :
• Calendar periods ( day, week, month, quarter, and year)
• Financial periods ( financial month, financial year)
• Relative periods for comparison( last month, last year)
• Periods of special status ( weekdays, weekends,
holidays )
A typical date dimension
Date Dim
Date Key (PK)
Date
WeekNumber
MonthName
MonthNumber
Quarter
Year
FinancialMonth
FinancialQuarter
Financial Year
IsWeekDay
IsWeekEnd
IsHoliday
Time Dimensions
• The date dimension is at a daily grain or detail level.
• Sometimes we are not only interested in the day that a
fact occurred but also the time at which it occurred.

• One option is to include DateTime attribute in the fact


table. But this option does not allow analysis by time
periods such as morning, afternoon, and evening.
The other option is to introduce a new dimension called
Time that can store individual hours or periods of time.

Calendar Date Product


DateKey ProductKey
Date SKU
WeekNumber Name
…. …..

Order Fact
Time Dim Customer
DateKey
TimeKey ProductKey CustomerKey
IsMorning CustomerKey Code
IsAfternoon TimeKey FirstName
IsEvening ….

Degenerate Dimensions
What are degenerate dimensions?

Are dimensions with no proper link to a fact table instead


their attributes are stored in the fact table.

Usually one attribute for each degenerate dimension is


stored in a fact table. This attribute is called degenerate key.

Degenerate dimensions have no attributes other than the


degenerate key, that is why it is not important to have a
separate dimension.
An example of degenerate dimension is Order Header
with order number attribute. As the Order Header
dimension has only one attribute, there is no need to
create a dimension.
Order Fact

Order Header DateKey


ProductKey
OrderNumber CustomerKey
OrderDate
OrderLineNumber
OrderNumber
Quantity Ordered
TotalCost
TotalRevenue
Slowly Changing Dimensions
One of the main benefits of Data warehousing is tracking
history or changes.
Examples:
 When a product changes its price.
 When a customer changes address or marital
status.
 When a store changes its manager.
User reporting and analysis requirements that require tracking
history or changes.
 Marketing people might want to compare the
impact of promotions for different product
prices.
 Sales persons might want to generate reports of
total sales by marital status.
 Reporting performance of managers by store.
There are three main techniques for handling slowly
changing dimensions:
I. Type 1: Overwrite the dimension attribute.
II. Type 2: Add a new dimension row.
III. Type 3: Add a new dimension attribute / column.
Type 1: Overwrite the dimension attribute:
 The dimension attribute reflects the current state.
 Any historical values are lost.

Applicable when the old value has no business


significance or when there is no need to track changes.
For example first name or phone number of a customer.
Type 1 technique is easy to implement.

15
Type 2: Add a new dimension row:

 A new row with a new surrogate primary key is inserted


into the dimension table.

 Both the previous and new rows include natural key to


identify that both records have the same origin.

 New attributes are added to indicate when the change


happened and which one is the current row.

Type 2 technique is the most powerful technique for


accurately tracking changes in attribute values.
Customer
CustomerKey The tracked attribute is Marital status
Code
FirstName
LastName
BirthDate
MaritalStatus
IncomeGroup
EffectiveFrom
EffectiveTo

Custom Code FirstNam LastName BirthDate MaritalSta IncomeGr EffectiveFr EffectiveTo


erKey e tus oup om

1 CBTR Sara John 01/2/88 Single 20-30K 2/3/200 4/5/2010


5
2 CBTR Sara John 01/2/88 Married 20-30K 4/5/201 31/12/99
0 99
Type 3: Add a new dimension column:

 A new column is added to the dimension table.

 Used when an attribute changes, but still possible to


provide the old and new attribute values in the same row.

 No new dimension rows are created and no new surrogate


keys are created
Store Dim
StoreKey The tracked attribute is Manager
Code
Name
Description
PreviousManager
CurrentManager
…..

StoreKe Code Name Description Previous Current


y Manager Manager

1 ASTT Main The main David David


store store …

StoreKe Code Name Description Previous Current


y Manager Manager

1 ASTT Main The main David Andrew


store store …
Aggregate fact tables
Contain a large numbers of rows and grow rapidly over time.

For example:
daily sale transactions for thousands of products will
increase by number of product sales per day.

Most of the time, managers are not interested for detailed


reports of daily sales per product.

Monthly sales per product type or yearly sales per product


category are more relevant for managers.
Product Type
ProductTypeKey
Calendar Month Name
Aggregate Order Description
MonthKey Fact
Name
Number MonthKey
Year ProductTypeKey Product
Total quantity
Total revenue ProductKey
SKU
Name
Calendar Date Brand
Category
DateKey Order Fact Price
Date DateKey Cost
WeekNumber ProductKey
MonthNumber Customer
CustomerKey
MonthName OrderNumber CustomerKey
Year OrderLineNumber Code
Total quantity FirstName
Total revenue LastName
BirthDate
Types of Fact tables
There are three main types of fact tables:

I. Transactional fact tables : new rows are inserted


when an activity or event occurs.

II. Periodic snap shots fact tables : Inserted at a


predetermined intervals such as daily, weekly, or
monthly.

III. Accumulating snap shots fact tables: to represent


processes
Order
Order Fact StockLevel OrderID
OrderDateKey
DateKey DateKey ProcessedDateKey
ProductKey ProductKey DispatchedDateKey
CustomerKey EmployeeKey IsProcessed
OrderNumber Quantity IsDispatched
OrderLineNumber ProcessingTimeLag
Total quantity
Total revenue
Periodic snapshots fact
tables
Transactional fact Accumulating
tables Snapshots fact table
Account transaction
Account Balance
AccountKey
DateKey AccountKey
BranchKey MonthKey
Type AccountBalance
Amount NumberOfTransactions
Developing Dimensional Data Models
The development of dimensional data model is an
iterative process and includes three main phases.

I. Create a high level dimensional data model

II. Identify attributes of dimensions and measures


of fact tables

III. Build a detailed dimensional data model.


Create a high level dimensional data model

 Identify a central fact table.


 Define the grain or detail level
 Identify dimensions using the following questions: when
does it occur, what is involved, who is involved, where does
it occur, how does it occur, and why does it occur.
 Create the high level dimensional model diagram
An example of a high level dimensional data
model diagram to represent selling product
business process
Date
Product Transaction Type

Payment Type
Fact Sale Transaction

Customer

Register Promotion

Outlet
Identify attributes of dimensions and measures of the
central fact table

 List attributes of each dimension based on the business


analysis and reporting requirements.

 Identify measures based on the business analysis and


reporting requirements.
Build a detailed dimensional data model

 Enrich the high level model with missing information.

 Resolve design issues.

 Test if the dimensional data model is complete against the


business requirements.

 Identify, understand, and profile data sources.


Cont..

An example of a detailed design for one dimension table.


Tutorial # 2

30
1. The following table form part of a database held in a relational DBMS.
 
Hotel (hotelNo, hotelName, City)
Room (roomNo, hotelNo, type, price)
Booking (hotelNo, guestNo, datefrom, dateTo, roomNo)
Guest (guestNo, guestName, guestAddress, guestcardNo,
expiryDate)

31
32
33
Solution of No. 2

34
35
1. Understanding of Data Sources using Profiling techniques.

2. Building Detailed Designs of Dimensions and Fact tables

3. Implementing a data warehouse.

36
Understanding of Data Sources

The goal of data understanding is to know the structure,


relationships, content and rules of the potential data sources that
will feed data to a data warehouse. Assessing accessibility and
data quality issues are also part of data understanding.

There are different techniques to understand data sources:


• Consulting Database administrators
• Examining documents or specifications such as ERD
models.
• Data profiling
Data profiling is the application of SQL commands or profiling
software tools to collect information and statistics about data
sources.

It should include :

• Data type, minimum and maximum filed lengths.


• Mean, mode, minimum and maximum values of numeric
data type.
• Number of all records and unique records only.
• Number of NULL records
• Any patterns identified.
Demo using SQL Server
Management Studio
Build Detailed Designs of Dimension and Fact Tables
An example of a detailed design for a dimension
table.
What information do we need to build detailed designs of
dimension or fact tables.

• Design of a data warehouse (Dimensional data models).


• Data understanding of data sources.

Steps:
For each dimension and fact table
- Define columns and identify primary keys.
- Identify data types of each column.
- Build a source to target mapping for each column.
Implementing a Data Warehouse

Once the design stage is performed, the next step is to


implement a data warehouse.

1. Create a database to store dimension and fact tables.


2. Create dimension tables.
3. Create fact tables.
4. Build primary and foreign keys constraints.
5. Create a separate database to hold the staging, security, and
auditing tables.
Wholesale furniture company
Design the data warehouse for a wholesale furniture company.
The data warehouse has to allow to analyse the company’s
situation at least with respect to the Furniture, Customers and
Time.
Moreover, the company needs to analyse:
1. The furniture with respect to its type (chair, table, wardrobe,
cabinet. . . ), category (kitchen, living room, bedroom, bathroom,
office. . . ) and material (wood, marble )
2.The customers with respect to their spatial location, by
considering at least cities, regions and states

The company is interested in learning at least the quantity,


income and discount of its sales.
Identify Central Fact Table
Identify Dimension Tables
Build a high level dimensional
data model

Furniture
Customer

Sales

Dat
e
Identify attributes of Dimensions and Measures of the Central
Fact Table:
Date Dim Customer Dim

DateKey CustomerKey
Sales Fact Name Gender
Date
Name BirthDate City
DateKey Region
WeekNumber FurnitureKey
MonthNumber State
CustomerKey
MonthName Quantity
Quarter Income
Year Discount

Furniture Dim
FurnitureKey
Type Category
Material
Build Detailed Designs of Dimension
and Fact Tables:

Column Name Column Def. Data Type Key Null


DateKey Surrogate Key int PK not
Date SQL date date - not
Name Varchar(30) - not
WeekNumber int - not
MonthName Varchar(30) - not
MonthNumber int - not
Quarter int - not
Year int - not

A simplified design of Date dimension.


Column Name Column Def. Data Type Key Null
CustomerKey Surrogate Key int PK not
Name Varchar(50) - not
Gender Char(1) - not
BirthDate Date - not
City Varchar(50) - not
Region Varchar(50) - not
State Varchar(50) - not

A simplified design of Customer dimension without source to target mapping.


Column Name Column Def. Data Type Key Null
FurnitureKey Surrogate Key int PK not
Type Varchar(25) - not
Category varchar(25) - not
Material Varchar(25) - not

A simplified design of Furniture dimension without source to target mapping.


Column Name Column Def. Data Type Key Null
DateKey Surrogate Key int PK, FK not
CustomerKey Surrogate Key int PK, FK not
FurnitureKey Surrogate Key int PK, FK not
Quantity int - not
Income double - not
Discount double - not

A simplified design of Sale fact table without source to target mapping.


Create a relational database to store dimension and fact tables:

CREATE DATABASE MainDWDatabase2016


ON (NAME = 'MainDWDatabase2016_Data', FILENAME =
'C:\DataWarehouse\MainDWDatabase2016_Data.mdf', SIZE = 1000,
FILEGROWTH = 50)
LOG ON (NAME = 'MainDWDatabase2016_Log', FILENAME =
'C:\DataWarehouse\MainDWDatabase2016_Log.ldf' , SIZE =
20, FILEGROWTH = 9 6 ) ;
GO

The above SQL command creates an empty database named


‘MainDWDatabase2016’
And allocates two files: one for storing data and another for storing
transactional logs.
Create dimension tables:

CREATE TABLE dbo.DateDim(


DateKey i n t IDENTITY(1,1) NOT NULL,
Date date NOT NULL,
Name varchar(30) NOT NULL,
WeekNumber i n t NOT NULL,
MonthName nvarchar(30) NOT
NULL, MonthNumber i n t Not NULL,
Quarter i n t NOT NULL,
Year i n t NOT NULL
) ON [PRIMARY];
GO

SQL command to
create a date
dimension table.
CREATE TABLE
dbo.CustomerDim( CustomerKey i n t
IDENTITY(1,1) NOT NULL, Name
varchar(50) NOT NULL,
Gender char(1) NOT NULL,
BirthDate date NOT NULL,
City varchar(50) Not
NULL,
Regional varchar(50) NOT NULL,
State varchar(50) NOT NULL
) ON [PRIMARY];
GO

SQL command to create a customer dimension table.


CREATE TABLE
dbo.FurnitureDim( FurnitureKey i n t
IDENTITY(1,1) NOT NULL, Type varchar(25)
NOT NULL,
Category varchar(25) NOT NULL,
Ma te rial varchar(25) NOT NULL
) ON [PRIMARY];
GO

SQL command to create a furniture dimension table.


CREATE TABLE
dbo.SaleFact( DateKey i n t
Not NULL, CustomerKey i n t
NOT NULL, FurnitureKey
i n t NOT NULL, Quantity
i n t NOT NULL, Income
f l o a t NOT NULL,
Discount f l o a t NOT NULL
) ON [PRIMARY];
GO

SQL command to create a sale fact table.


Build Primary Keys:

ALTER TABLE dbo.DateDim ADD


CONSTRAINT PK_DateDim PRIMARY KEY (DateKey) ON [PRIMARY];

ALTER TABLE dbo.CustomerDim ADD


CONSTRAINT PK_CustomerDim PRIMARY KEY (CustomerKey) ON
[PRIMARY];

ALTER TABLE dbo.FurnitureDim ADD


CONSTRAINT PK_FurnitureDim PRIMARY KEY (FurnitureKey ) ON
[PRIMARY];

ALTER TABLE dbo.SaleFact ADD CONSTRAINT


PK_SaleFact PRIMARY KEY (DateKey,FurnitureKey,CustomerKey ) ON
[PRIMARY];
Build Foreign Keys:

A l t e r Table dbo.SaleFact ADD


CONSTRAINT FK_DateDim FOREIGN KEY (DateKey)
REFERENCES dbo.DateDim (DateKey),
CONSTRAINT Fk_CustomerKey FOREIGN KEY (CustomerKey)
REFERENCES dbo.CustomerDim(CustomerKey),
CONSTRAINT Fk_FurnitureKey FOREIGN KEY (FurnitureKey )
REFERENCES dbo.FurnitureDim(FurnitureKey );
GO

S-ar putea să vă placă și