Documente Academic
Documente Profesional
Documente Cultură
"Model"?
Planned House
Data Model
House
System
Blueprint
Data Model
Entity
Attribute
Characteristic of an entity
Relationship
Describes an association among entities
One-to-one (1:1) relationship
One-to-many (1:M) relationship
Many-to-many (M:N or M:M) relationship
Constraint
A restriction placed on the data
Business Rules
Analysts Portal
Data Marts
Ad hoc Querying /
Reporting / Viewing
ETL
External
Source System
Operational
Data
Staging
Area
Analytical /
Historical Data
OLAP Analysis
Alerts
External Portal
Other Custom
Databases
Performance
Reporting
Client
Reporting
ETL
Source Systems
Marketing and
Sales Teams
Data Warehouse
Marts
Reporting /
Querying /
OLAP Viewing
Layer
External
Portals
Delivery
11
12
One to Many
Branch_Master(Br_Cod, Ctry_Cod)
Country(Ctry_Cod, Name)
Many to Many
Artist(Artist_ID, Name)
Album(Album_ID, Album_Name)
Link_Artist_Album(Artist_ID, Album_ID)
Branch_Sales
Branch_Master
Br_Cod (PK)
Br_Cod (PK)
Year
Ctry_Cod
Sales
Branch_Master
Br_Cod (PK)
Country
Ctry_Cod
Ctry_Cod (PK)
Name
Artist
Link_Artist_Album
Artist_ID (PK)
Artist_ID (PK)
Name
Album_ID (PK)
Album
Album_ID (PK)
Album_Name
13
Entity
Table
Attribute
Column
Primary Key
Rule
Relationship
Foreign Key
14
Database Normalization
NORMALIZATION is the process of efficiently organizing data in a
database to meet following goals
Eliminating redundant data
Ensuring proper data dependencies
Advantages of Normalization
Reduce the amount of space a database consumes
Data is logically stored and prevent data anomalies
Faster Processing in OLTP systems
Normal Forms
First Normal Form
Second Normal Form
Third Normal Form
15
De-Normalization
Process of introducing redundancy in a normalized database in order to address
performance problems
First Normalize, then identify performance problems, exhaust normal tuning
methods, then go for denormalization
De-normalize a database to reduce number of joins required in a query, usually for
reporting purposes
FACT Tables are normalized, DIMENSIONAL tables often contain de-normalized
data
Normalized alternative to Star Schema is Snowflake Schema
De-normalized Product
Product
Prod_Code (PK)
Prod_Name
Brand_Code
Product
Brand
Brand_Code
Brand_Manager
Prod_Code (PK)
Prod_Name
Brand_Code
Brand_Manager
16
Dimensional Modeling
Dimensional modeling (DM) is a LOGICAL design technique often used
for Data Warehouses
Composed of a central FACT Table, and a set of smaller tables called
DIMENSION Tables
The physical architecture of Dimensional Model is represented in STAR
Schema or SNOWFLAKE Schema
Advantages
Dimensional Model is a predictable, standard framework.
Extensible to accommodate unexpected new data elements and design
decisions
Supports SLOWLY CHANGING Dimensions
Used for calculating SUMMARIZED data
17
Dimensional Modeling
Transaction Performance
Query Performance
Normal Reports
ERM
Production
System
End
User
Production
System
End
User
Data
Warehouse
DM gives end users a better way to access the data contained in the
organization's operational systems
Dim_Product
Time (PK)
Product (PK)
Day
Prod_Name
Month
Quarter
Year
Fact_Sales
Time (PK)
Prod_Desc
Category
Product (PK)
Geography (PK)
Customer (PK)
Unit_Sales
Price
Sales_Amount
Dim_Geography
Geography (PK)
Branch
City
State
Dim_Customer
Customer (PK)
Cust_Name
Cust_Phone
Email
Country
21
Drawbacks
May lead to multiple dimension tables
22
Dim_Year
Dim_Qtr
Year (PK)
Quarter (PK)
Dim_Product
Product (PK)
Prod_Name
Year
Dim_Mth
Dim_Time
Month (PK)
Time (PK)
Day
Quarter
Month
Fact_Sales
Time (PK)
Prod_Desc
Category
Product (PK)
Geography (PK)
Customer (PK)
Unit_Sales
Dim_Geography
Geography (PK)
Dim_City
Dim_Country
Dim_State
City (PK)
State (PK)
State
Price
Sales_Amount
Dim_Customer
Customer (PK)
Cust_Name
Branch
Cust_Phone
City
Country
Country (PK)
23
Drawbacks
Complex queries and more foreign key joins
Complicated maintenance
Explosion in the number of tables in the database
24
Dimensional Modeling
Less indexed
Highly indexed
Normal Reports
25
Subject areas to facilitate the view of data marts and merging them into the
Enterprise Wide Data Warehouse (EDW)
Reports - Standard set of reports provided by Erwin
26
Vague Purpose
Dont build a model without understanding the business
rationale. The purpose for a model dictates the level of detail
(just entities and relationships, fully attributed, with data types
and full constraints).
Literal Modeling
Data modeling cannot be done literally only with Customer
inputs. We need to capture and solve the problem that the
customer is imperfectly describing. We need to pay attention
to the hidden true requirements. You must interpret and
abstract what the customer tells you.
27
Speculative Content
At least 90 percent of a model should pertain to immediate
needs. As much as 10 percent can anticipate future needs.
Otherwise you run the risk of scope creep .
28
29
Parallel Attributes
Parallel attributes are acceptable for a data warehouse and
are often used in dimensions to simplify queries.
30
Anonymous Fields
As much as possible, you should clearly describe the data
being stored and not use anonymous fields.
a location table with anonymous fields. To find a city, you
must search multiple fields.
like Addr1 , Addr2 anad Addr3 where any info can be kept.
It would be much better to put address information in distinct
fields that are clearly named.
31
32
Tool Name
Company Name
ERWin
Computer Associates
Embarcadero
Embarcadero Technologies
Power Designer
Sybase Corp
Oracle Designer
Oracle Corp
Rational Rose
IBM
34
Q&A
36
Thanks!
37