Sunteți pe pagina 1din 12

Data Warehousing

Venkataraj Jayaraj
Data Warehousing

Venkataraj Jayaraj
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Normal Reporting Architecture

Source
Reports

Reports

Reports

From a Source the reports are generated directly without any transformations.

Benefits:
 Represent current data
 Simple and easy to design and generates the reports

Drawbacks
 No historical data – May not be useful in decision making process.

Data Warehouse Architecture


Data Data Mart
Source Staging
Area Warehouse
Analysis

Oracle

Metadata

Teradata Reporting
Raw Summary
Data Data

DB2

Data Mining
SQL Server

3 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Benefits:
 Performance
 Report generation simplified
 Contain history

Drawbacks:
 No current data
 Administration overhead

Source:
It’s a Database where from extract the data. Ex: Oracle, Teradata,Sybase,DB2

Staging area:
It’s a temporary storage area used for the process of data

Meta Data:
Data about the data Or Description of the data.

Data mart
A Data mart is nothing but a Data warehouse but for specific domain

A Data mart can be divided into two types:


 Independent Data mart
 Dependent Data mart

4 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Independent Data mart


Data Mart Data
Source Staging
Area Warehouse Analysis
Oracle Metadata

Teradata Raw Summary


Data Data
Reporting
DB2
Data Mining
SQL Server

Independent Data mart Architecture

Such Data mart’s extract the data from source databases directly and these Data marts are
merged into Data warehouse.

Advantages:
 Maximum utilization of resources
Hardware ,Software,Manpower
 Easy maintains
 Risk of failure is reduced

Disadvantages:
 Total cost of development is very high
 Integration problem

This approach is good for: Large organizations

5 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Dependent Data mart


Data Data Mart
Source Staging
Area Warehouse
Analysis

Metadata
Reporting
Raw Summary
Data Data

Data Mining

Dependent Data mart Architecture


Such Data mart extract data from Data warehouse

Advantages:
 Total cost & time of development is very low
 No integration problem

Disadvantages:
 Can’t use the full resources.

This approach is good for:


 Small & medium sized organization
 new organization

What are Data Warehouses?


Data warehouses store large volumes of data which are frequently used by DSS.
It is maintained separately from the organization’s operational databases.
Data warehouses are relatively static with only infrequent updates.
A data warehouse is a stand-alone repository of information, integrated from several, possibly
heterogeneous operational databases.

6 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Data Warehousing
Is the enabling technology that facilitates improved business decision-making.It’s a process,
not a product
A technique for assembling and managing a wide variety of data from multiple operational
systems for decision support and analytical processing.

Data Warehouse is a
 Subject-Oriented- Integrated - Time-Variant- Non-volatile
collection of data in support of management’s decision

Subject Oriented Analysis


Process Oriented Subject Oriented

Entry
Sales Rep Sales
Sales
Quantity Sold
Prod Number
Date Customers
Customers
Customer Name
Product Description
Unit Price Products
Products
Mail Address

Transactional Storage Data Warehouse Storage

7 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Integration of Data
Appl. A - M, F
Encoding Appl. B - 1, 0 M, F
Appl. C - X, Y

Appl. A - pipeline cm.


Unit of Appl. B - pipeline inches pipeline cm
Attributes Appl. C - pipeline mcf

Appl. A - balance dec(13,2) balance dec(13, 2)


Physical Appl. B - balance PIC
Attributes 9(9)V99
Appl. C - balance float

Appl. A - date (Julian)


Data Appl. B - date (yymmdd) date (Julian)
Consistency Appl. C - date (absolute)

Transactional Storage Data Warehouse Storage

Volatility of Data
Volatile Non-Volatile

Insert Change

Delete

Load
Change
Access

Record-by-Record Data Mass Load / Access of Data


Manipulation

Transactional Storage Data Warehouse Storage

8 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

To Data warehouse structure we can use Dimensional Modeling.


1. Measurable Data (Measures)
2. Dimension Data (Dimension)

Measurable Data
Those numeric data that can be used in mathematical operations and can be summarized and
aggregated.Ex: net profit
Measurable data is required to evaluate the performance of a person, object etc… for example
Net profit of a company can be used & evaluate company performance.
Measurable data are analyzed from different angles referred as dimension.
At least two dimension are required to evaluate a measure(s)

Dimension Data
An angle to evaluates measures are referred as dimension.
A Dimension can be collection of sub-dimension referred as levels.
These sub-dimensions with in a dimension. We arranged in hierarchical relation
It means two sub-dimension can not be at the same level.

Types of schemas
 Star Schema
 Star flake schema
 Snow flake schema

Star schema
Measurable data in center surrounded by different dimensions
A dimension will have only one level , so these in no hierarchy.
No relation should be defined between two dimension.
Combination of measures with related dimensions is referred as cube.

9 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Collection of measures at database level becomes table ( referred as fact table )


Levels ( sub –dimensions) with in a dimension also become a table at database level( referred
as dimension table)

Data ware term Database term


 Cube  Schema
 Dimension  -----
 Level  Table (dimension table)
 Measure  Table (fact table)
 Relation/hierarchy  Constraint
 Data ware/data mart  Database
 Attributes  Columns

Star flake schema


Same as star schema but the cube will have at least one dimension with Two / more levels in
single hierarchy.

Snowflake schema
Same use star flake schema but the cube will have at least one dimension with two/more
levels under at least Two hierarchy.

ETL
Extract, Transform, and Load (ETL) is a process in data warehousing that involves
extracting data from outside sources, transforming it to fit business needs (which can include
quality levels), and ultimately loading it into the end target, i.e. the data warehouse.
ETL process can be created using almost any programming language, creating them from
scratch is quite complex. ETL tools available to help in the creation of ETL processes.
A good ETL tool must be able to communicate with the many different relational databases
and read the various file formats used throughout an organization.

10 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.
Data Warehousing

Confidential & Proprietary


Copyright © 2008 The Nielsen Company

Some of the ETL Tools available in the Market are:


 Ab Initio
 Apatar
 BusinessObjects Data Integrator
 Clover.ETL
 DMExpress
 Data Junction
 Data Transformation Services
 IBM WebSphere DataStage
 Informatica
 LogiXML
 Pentaho
 Pervasive Data Integrator
 RODIN Data Asset Management
 SQL Server Integration Services
 Scriptella
 Sprog (software)
 Sunopsis
 Talend Open Studio

11 TCS Confidential
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means – electronic, mechanical, photocopying, recording, or otherwise – without the permission of Tata Consultancy Services.

S-ar putea să vă placă și