Documente Academic
Documente Profesional
Documente Cultură
Yasim Kolathayil
yasimk@gmail.com
yasim@damaninc.com
Yasim Kolathayil – yasimk@gmail.com 1
What is a Data Warehouse?
Yasim Kolathayil – yasimk@gmail.com 2
Why Data Warehousing?
Replicated
Data Sets
Source
OLTP
Systems
Static and
Ad-hoc Reporting
Data
Warehouse
Graphical
Data Analysis
Data
Warehouse
Data Mart 1
OLAP
Servers
Data Mart 2
Data
Staging
Area Data Mart 3 •Ad-Hoc Query
•End User
Applications
•Reports
Data Mart 4
Inmon DW Kimball DW
Hub and Spoke Model (Corporate Information Factory) – SWA Bus Architecture
Model
Integrated DW with data at the atomic level Integration through conformed dimensions & facts.
Staging area and Data Warehouse constitutes the backroom. Transient staging area is the backroom.
E-R Model for Data Warehouse Only Star Schema for Data Warehouse
Yasim Kolathayil – yasimk@gmail.com 9
DW Design Strategies
Information
Individually Less
Structured
Departmentally History
Structured Normalized
Detailed
Organizationally More
Structured Data Warehouse
Data
Yasim Kolathayil – yasimk@gmail.com 11
What are the differences?
Performance Needs Tuned for Update Tuned for Query & Tuned for Query Not Applicable
Extraction
Yasim Kolathayil – yasimk@gmail.com 12
The Multi-Dimensional Data Model
Yasim Kolathayil – yasimk@gmail.com 13
The “Classic” Star Schema
Yasim Kolathayil – yasimk@gmail.com 14
Baggage Data Mart
Yasim Kolathayil – yasimk@gmail.com 15
The “Classic” Star Schema
Yasim Kolathayil – yasimk@gmail.com 17
Star and Snow Flake Model
Yasim Kolathayil – yasimk@gmail.com 19
OLTP vs. OLAP
OLTP OLAP
Yasim Kolathayil – yasimk@gmail.com 20
What is METADATA?
Yasim Kolathayil – yasimk@gmail.com 21
Types of Meta Data
Business Metadata – Captures business definitions, structure and
hierarchy of data, subject areas, definition of metrics etc.
Yasim Kolathayil – yasimk@gmail.com 22
Metadata Architecture
Access
Replication
Aggregation
Loading
Transformation
Scrubbing
Extraction
Mapping
• Data extraction
• Data cleaning
• Data transformation
Convert from legacy/host format to warehouse format
• Load
Sort, summarize, consolidate, compute views, check integrity, build indexes,
partition
• Refresh
Propagate updates from sources to the warehouse
Yasim Kolathayil – yasimk@gmail.com 25
Data Extraction Concepts
detection
How to recognize the changes?
Important to understand when a data is changed in source system
When to extract the data?
Choosing when to extract the data is a key consideration.
Yasim Kolathayil – yasimk@gmail.com 26
Data Transformation Concepts
Why cleansing?
More data and multiple sources could mean more errors in the data and harder to trace such errors
Detecting data anomalies and rectifying them early has huge payoffs (Validate component at
the beginning of graph)
Transformation Rules
Yasim Kolathayil – yasimk@gmail.com 27
Load Concepts
Issues:
Techniques:
Yasim Kolathayil – yasimk@gmail.com 28
Questions? About TDWI?
Yasim Kolathayil – yasimk@gmail.com 30
E.g. of Graphical Breakeven Analysis (Taken from Internet)
Breakeven Analysis
4,000,000
3,000,000
Dollars
Benefits
2,000,000
. Costs
1,000,000
0
0 1 2 3 4 5 6
Years
Yasim Kolathayil – yasimk@gmail.com 31
Wal-Mart - 500-600% Growth with No Decline
Yasim Kolathayil – yasimk@gmail.com 32
A Present Day History Lesson
Wal-Mart
Kmart
Yasim Kolathayil – yasimk@gmail.com 33
A Present Day History Lesson
Wal-Mart
Kmart
Yasim Kolathayil – yasimk@gmail.com 34