Sunteți pe pagina 1din 37

<Insert Picture Here>

Data Warehousing Concepts


Business Intelligence - Team
Oracle Financial Services Consulting

Data Modeling Concepts - Coverage


What is Data Modeling, Why Model?
Data Modeling Terminology
Logical and Physical Model
Normalization and De-Normalization
Dimensional Modeling
Star and Snowflake Schema
Fact and Dimension Tables
Relational vs. Dimensional Modeling
Erwin Case Tool

Data Modeling Concepts

"Model"?

A model is a symbolic or abstract representation of


something real or imagined.

What is Data Modeling ??


Is a method used to define and analyze data
requirements needed to support the business processes
of an organization.
A Data model is a conceptual representation of data
structures (tables) required for a database
Powerful in expressing and communicating the business
requirements
Visually represents the nature of data , business rules
governing the data and the way it is organized in the
system.
4

Why .. Why Data models

Why ?? (contd ..)

Business rules , processes change over period of


time. So a small change leads to large changes in
the computer systems.
Entity types incorrectly identified. This leads to data
replication.
No standardization of data. So data cannot be shared
with customers or Internal Management.

Planned House

Data Model

House

System

Blueprint

Data Model

House built on a good blueprint can be used for


many purposes.
System built in a good data model can have new
ways of doing business, new lines of business,
even new businesses - without throwing out the
system.
7

Data Model Basic Building


Blocks

Entity

Anything about which data will be collected/stored

Attribute
Characteristic of an entity

Relationship
Describes an association among entities
One-to-one (1:1) relationship
One-to-many (1:M) relationship
Many-to-many (M:N or M:M) relationship

Constraint
A restriction placed on the data

Business Rules

Brief, precise and unambiguous descriptions of


policies, procedures or principles within the
organization

Describe characteristics of the data as viewed by the


company

Translating Business Rules


to Data Model Components
Standardize companys view of data
Communication tool between users and designers
Allow designer to understand the nature, role and scope
of data
Allow designer to understand business processes
Allow designer to develop appropriate relationship
participation rules and constraints

Promote creation of an accurate data model


Nouns translate into entities
Verbs translate into relationships among entities
Relationships are bi-directional

Typical Data Flow


Data Modeling: Important in Staging, DWH and DataMarts
M E TADATA
Internal
Source System

Analysts Portal

Data Marts
Ad hoc Querying /
Reporting / Viewing

ETL

External
Source System

Operational
Data
Staging
Area
Analytical /
Historical Data

Data Mart Builder

Data Integration Layer

OLAP Analysis

Alerts

External Portal
Other Custom
Databases

Performance
Reporting
Client
Reporting

ETL

Source Systems

Data Modeling Concepts

Marketing and
Sales Teams

Data Warehouse

Marts

Reporting /
Querying /
OLAP Viewing
Layer

External
Portals

Delivery

11

Data Modeling Terminology


ENTITY: The entity is a person, object, place or event for which data is
collected
ATTRIBUTE: Parameters which define the properties of an entity
RELATIONSHIPS: Business rules that determine how entities interact
with each other
e.g. Sales Representative SERVES Customer
CARDINALITY: Defines the relationship between the entities in terms of
numbers
OPTIONAL: Sales Representative could have zero or many customers
MANDATORY: At least one product should be listed in an order

Data Modeling Concepts

12

Data Modeling Terminology contd.


PRIMARY KEY: Column(s) to uniquely identify each record in a table
FOREIGN KEY: Identifies column(s) in one table that refers to columns(s)
in another table (parent)
One to One
Branch_Master(Br_Cod, Ctry_Cod)
Branch_Sales(Br_Cod, Year, Sales)

One to Many
Branch_Master(Br_Cod, Ctry_Cod)
Country(Ctry_Cod, Name)

Many to Many
Artist(Artist_ID, Name)
Album(Album_ID, Album_Name)
Link_Artist_Album(Artist_ID, Album_ID)

Branch_Sales

Branch_Master

Br_Cod (PK)

Br_Cod (PK)

Year

Ctry_Cod

Sales

Branch_Master
Br_Cod (PK)

Country

Ctry_Cod

Ctry_Cod (PK)
Name

Artist

Link_Artist_Album

Artist_ID (PK)

Artist_ID (PK)

Name

Album_ID (PK)

Album
Album_ID (PK)
Album_Name

Data Modeling Concepts

13

Logical and Physical Model


LOGICAL Model: Representation of the business requirements,
entities, attributes and relationships
PHYSICAL Model: Includes tables, columns, constraints,
database properties for physical implementation
Logical Data Model

Physical Data Model

Represents business information and


defines business rules

Represents physical implementation of the


model in a database.

Entity

Table

Attribute

Column

Primary Key

Primary Key Constraint

Rule

Check Constraint, Default Value

Relationship

Foreign Key

Data Modeling Concepts

14

Database Normalization
NORMALIZATION is the process of efficiently organizing data in a
database to meet following goals
Eliminating redundant data
Ensuring proper data dependencies

Advantages of Normalization
Reduce the amount of space a database consumes
Data is logically stored and prevent data anomalies
Faster Processing in OLTP systems

Normal Forms
First Normal Form
Second Normal Form
Third Normal Form

Why not higher Normal Forms


Requires high-end database features
Complexity increases, size constraints
Most applications work well with 3NF

Data Modeling Concepts

15

De-Normalization
Process of introducing redundancy in a normalized database in order to address
performance problems
First Normalize, then identify performance problems, exhaust normal tuning
methods, then go for denormalization
De-normalize a database to reduce number of joins required in a query, usually for
reporting purposes
FACT Tables are normalized, DIMENSIONAL tables often contain de-normalized
data
Normalized alternative to Star Schema is Snowflake Schema
De-normalized Product

Normalized Product Tables

Product
Prod_Code (PK)
Prod_Name
Brand_Code

Product
Brand
Brand_Code
Brand_Manager

Prod_Code (PK)
Prod_Name
Brand_Code

Brand_Manager

Data Modeling Concepts

16

Dimensional Modeling
Dimensional modeling (DM) is a LOGICAL design technique often used
for Data Warehouses
Composed of a central FACT Table, and a set of smaller tables called
DIMENSION Tables
The physical architecture of Dimensional Model is represented in STAR
Schema or SNOWFLAKE Schema
Advantages
Dimensional Model is a predictable, standard framework.
Extensible to accommodate unexpected new data elements and design
decisions
Supports SLOWLY CHANGING Dimensions
Used for calculating SUMMARIZED data

Data Modeling Concepts

17

Differences : Database Vs Dataware house


Relational vs. Dimensional
Relational Modeling

Dimensional Modeling

Data is stored in RDBMS tables

Data is stored in RDBMS or MDBs /


cubes

Data is normalized and optimized for


OLTP

Data is de-normalized and optimized


for OLAP, DWH

Transaction Performance

Query Performance

Volatile (many updates) and time


variant

Non-volatile and usually time invariant

Detailed level of segregated


transaction data

Aggregated data and measures used

Normal Reports

Drag and Drop multidimensional OLAP


reports

Relational vs. Dimensional (Cont)


DM

ERM
Production
System

End
User

Production
System

End
User

Data
Warehouse

DM gives end users a better way to access the data contained in the
organization's operational systems

Relational vs. Dimensional (Cont)

STAR Schema (Example)


Dim_Time

Dim_Product

Time (PK)

Product (PK)

Day

Prod_Name

Month
Quarter
Year

Fact_Sales
Time (PK)

Prod_Desc
Category

Product (PK)
Geography (PK)
Customer (PK)
Unit_Sales
Price
Sales_Amount

Dim_Geography
Geography (PK)
Branch
City
State

Dim_Customer
Customer (PK)
Cust_Name
Cust_Phone
Email

Country

Data Modeling Concepts

21

STAR Schema contd.


Fact Tables are Normalized, Dimension Tables are De-normalized
Advantages

Easier to understand and navigate


Better performance minimizes number of joins
Supports multi-dimensional analysis
Extensible design supports changing business requirements
Allows relative easy maintenance
Recommended for most Decision Support Systems

Drawbacks
May lead to multiple dimension tables

Data Modeling Concepts

22

SNOWFLAKE Schema (Example)

Dim_Year

Dim_Qtr

Year (PK)

Quarter (PK)

Dim_Product
Product (PK)
Prod_Name

Year
Dim_Mth

Dim_Time

Month (PK)

Time (PK)
Day

Quarter

Month

Fact_Sales
Time (PK)

Prod_Desc
Category

Product (PK)
Geography (PK)
Customer (PK)
Unit_Sales

Dim_Geography
Geography (PK)
Dim_City

Dim_Country

Dim_State

City (PK)

State (PK)

State

Price

Sales_Amount

Dim_Customer
Customer (PK)
Cust_Name

Branch

Cust_Phone

City

Email

Country

Country (PK)

Data Modeling Concepts

23

SNOWFLAKE Schema contd.


Fact Tables are Normalized, Dimension Tables are Normalized
Advantages
Avoids redundancy and saves storage
Should improve understanding and overall performance
Quick response time when queries involve aggregation

Drawbacks
Complex queries and more foreign key joins
Complicated maintenance
Explosion in the number of tables in the database

Data Modeling Concepts

24

Relational Vs. Dimensional Modeling


Relational Modeling

Dimensional Modeling

Data is stored in RDBMS tables

Data is stored in RDBMS or MDBs / cubes

Data is normalized and optimized for OLTP

Data is de-normalized and optimized for


OLAP, DWH

Entity Driven, Transaction Performance

Data Driven, Query Performance

Less indexed

Highly indexed

Volatile (many updates) and time variant

Non-volatile and usually time invariant

Detailed level of segregated transaction data

Aggregated data and measures used

Normal Reports

Drag and Drop multidimensional OLAP


reports

Decision to go for OLTP or Data Warehouse is determined by the


business needs of the organization

Data Modeling Concepts

25

ERwin Database Design and Modeling Tool


Effective case tool for Logical / Physical data modeling
Supports Dimensional Modeling
Entity, Attributes and Relationships can be easily defined
Erwin talks to the back-end database
Reverse Engineering and
Forward Engineering

Subject areas to facilitate the view of data marts and merging them into the
Enterprise Wide Data Warehouse (EDW)
Reports - Standard set of reports provided by Erwin

Data Modeling Concepts

26

Things to avoid in a Data Modeling

Vague Purpose
Dont build a model without understanding the business
rationale. The purpose for a model dictates the level of detail
(just entities and relationships, fully attributed, with data types
and full constraints).

Literal Modeling
Data modeling cannot be done literally only with Customer
inputs. We need to capture and solve the problem that the
customer is imperfectly describing. We need to pay attention
to the hidden true requirements. You must interpret and
abstract what the customer tells you.

27

Things to avoid in a Data Modeling


Large Size
As a general rule, a model to be no more than 200 tables.
The reason is that large models involve more work. Need to
simplify with high level of abstraction.
Create Subject Areas for better readability and maintenance.

Speculative Content
At least 90 percent of a model should pertain to immediate
needs. As much as 10 percent can anticipate future needs.
Otherwise you run the risk of scope creep .

28

Things to avoid in a Data Modeling


Lack of Clarity
Normally a model should not be made difficult to understand
for humans. This can be achieved by using DOMAINS , UDPs
etc.
Violation of Normal Forms

An operational application concerns the routine


operations of a business.
An analytical applications emphasize complex queries
that read large quantities of data
Do not violate normal forms, except for analytical applications
and performance bottlenecks.

29

Things to avoid in a Data Modeling


Needless Redundancy
Ideally a database should have a single recording of each
data item.
Dont include redundant data in an attempt to compensate for
a poorly conceived application.

Parallel Attributes
Parallel attributes are acceptable for a data warehouse and
are often used in dimensions to simplify queries.

30

Things to avoid in a Data Modeling

Anonymous Fields
As much as possible, you should clearly describe the data
being stored and not use anonymous fields.
a location table with anonymous fields. To find a city, you
must search multiple fields.
like Addr1 , Addr2 anad Addr3 where any info can be kept.
It would be much better to put address information in distinct
fields that are clearly named.

31

Data Quality through Data Modeling


Data quality is the absence of intolerable defects. It is
not the absence of defects.
All data quality defects fall into a set of 9 broad
buckets:

1. Lacking integrity of reference between data values across the model


2. Entities lack unique identification
3. Unreasonable values
4. Attributes are used for multiple meanings
5. Inconsistent formatting
6. Incorrect data
7. Missing data
8. Miscalculations
9. Data that falls outside of its intended codification

32

The major data modeling constructs relevant to data


entry which relate to the data quality defect
categories are:
1. Uniqueness the enforcement that a column will have unique
values in it
2. Check guarantees that a columns value will fall in a predefined
range or list
3. Key forcing the integrity of desired references across entities
like the existence of the customer before he places an order
4. Mandatory forces a true value to be entered into a column
5. Default setting a value when none is entered
6. Null allowing null (no value) to be used in a column instead of
forcing a value

Defaults and null constraints are usually more


`problematic to data quality when used than when
not used because they allow for an abstract value (or
null, which is no value) to be used in place of a
customized, relevant value.
33

Popular Data Modeling Tools

Tool Name

Company Name

ERWin

Computer Associates

Embarcadero

Embarcadero Technologies

Power Designer

Sybase Corp

Oracle Designer

Oracle Corp

Rational Rose

IBM

All the above tools support Forward,Reverse


Engineering and ERD.

34

Q&A

Contact me for any queries

A.Rajesh Kumar a.rajesh.kumar@oracle.com


Desk No : 4918 1060

36

Thanks!
37

S-ar putea să vă placă și