Documente Academic
Documente Profesional
Documente Cultură
Venkataraj Jayaraj
Data Warehousing
Venkataraj Jayaraj
Data Warehousing
Source
Reports
Reports
Reports
From a Source the reports are generated directly without any transformations.
Benefits:
Represent current data
Simple and easy to design and generates the reports
Drawbacks
No historical data – May not be useful in decision making process.
Oracle
Metadata
Teradata Reporting
Raw Summary
Data Data
DB2
Data Mining
SQL Server
3 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
Benefits:
Performance
Report generation simplified
Contain history
Drawbacks:
No current data
Administration overhead
Source:
It’s a Database where from extract the data. Ex: Oracle, Teradata,Sybase,DB2
Staging area:
It’s a temporary storage area used for the process of data
Meta Data:
Data about the data Or Description of the data.
Data mart
A Data mart is nothing but a Data warehouse but for specific domain
4 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
Such Data mart’s extract the data from source databases directly and these Data marts are
merged into Data warehouse.
Advantages:
Maximum utilization of resources
Hardware ,Software,Manpower
Easy maintains
Risk of failure is reduced
Disadvantages:
Total cost of development is very high
Integration problem
5 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
Metadata
Reporting
Raw Summary
Data Data
Data Mining
Advantages:
Total cost & time of development is very low
No integration problem
Disadvantages:
Can’t use the full resources.
6 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
Data Warehousing
Is the enabling technology that facilitates improved business decision-making.It’s a process,
not a product
A technique for assembling and managing a wide variety of data from multiple operational
systems for decision support and analytical processing.
Data Warehouse is a
Subject-Oriented- Integrated - Time-Variant- Non-volatile
collection of data in support of management’s decision
Entry
Sales Rep Sales
Sales
Quantity Sold
Prod Number
Date Customers
Customers
Customer Name
Product Description
Unit Price Products
Products
Mail Address
7 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
Integration of Data
Appl. A - M, F
Encoding Appl. B - 1, 0 M, F
Appl. C - X, Y
Volatility of Data
Volatile Non-Volatile
Insert Change
Delete
Load
Change
Access
8 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
Measurable Data
Those numeric data that can be used in mathematical operations and can be summarized and
aggregated.Ex: net profit
Measurable data is required to evaluate the performance of a person, object etc… for example
Net profit of a company can be used & evaluate company performance.
Measurable data are analyzed from different angles referred as dimension.
At least two dimension are required to evaluate a measure(s)
Dimension Data
An angle to evaluates measures are referred as dimension.
A Dimension can be collection of sub-dimension referred as levels.
These sub-dimensions with in a dimension. We arranged in hierarchical relation
It means two sub-dimension can not be at the same level.
Types of schemas
Star Schema
Star flake schema
Snow flake schema
Star schema
Measurable data in center surrounded by different dimensions
A dimension will have only one level , so these in no hierarchy.
No relation should be defined between two dimension.
Combination of measures with related dimensions is referred as cube.
9 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
Snowflake schema
Same use star flake schema but the cube will have at least one dimension with two/more
levels under at least Two hierarchy.
ETL
Extract, Transform, and Load (ETL) is a process in data warehousing that involves
extracting data from outside sources, transforming it to fit business needs (which can include
quality levels), and ultimately loading it into the end target, i.e. the data warehouse.
ETL process can be created using almost any programming language, creating them from
scratch is quite complex. ETL tools available to help in the creation of ETL processes.
A good ETL tool must be able to communicate with the many different relational databases
and read the various file formats used throughout an organization.
10 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing
11 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.