Sunteți pe pagina 1din 41

Topic 1.

Need for Data Warehousing Overview & Concepts

Subject Incharge : Pratidnya S. Hegde Patil

Introduction to Data Warehouse

Textbook : Data Warehousing Fundamentals A comprehensive guide for IT Professionals, by Paulraj Ponniah, Publisher: John Wiley & Sons, 2nd Edition

Objectives
Understand the desperate need for strategic information Recognize the information crisis at every enterprise Distinguish between operational and informational systems Learn why past attempts to provide strategic information failed Clearly see why data warehousing is the viable solution

Data and Information

Were told we live in the information age. People often talk about data and information as if they were the same. They are, in many regards, opposite. A datum is just a fact : your name is a fact, your phone number is a fact. Information is data that is presented in a meaningful, understandable and beneficial format. Information is data that has been organized, sequenced, correlated and summarized, such as a phone book.

Data and Information

A phone book is information. It not only contains names and phone numbers, but it correctly associates each persons phone number with their names. It presents this list of correlated names and phone numbers in alphabetical sequence, so that we find the phone number from the name. In addition, it divides the phone numbers into two types; personal and business. It is the function of the computer to convert data to information.

Definitions
Database:

The database is a place where you put your data; data that you wish to convert to information at some future time. Management System: A DBMS is the software that converts the data in your database to information. It is the DBMS that provides you the capability for cross-referencing, correlating, sorting, summarizing, etc.

Database

Information as A Competitive Weapon


Information technology and quality information are not the goals, but merely to support organizations to reach goals of

Superior products and services Greater productivity Eventually success

The Information Crisis


Integrated: Must have a single, enterprise-wide view. Data Integrity: Information must be accurate and must conform to business rules. Accessible: Easily accessible with intuitive access paths, and responsive for analysis. Credible: Every business factor must have one and one value. Timely: Information must be available within the stipulated time frame.

ERP (Enterprise Resource Planning)

A software solution that addresses enterprise needs taking the process view of an organization to meet the organization goals. It integrates all the departments and functions across a company into a single computer system that can serve all those different departments particular needs. It is a single application that supports (manages) all aspects (domains) of a company. It supports the day to day operations of the company. To ensure that the transactions are fast it maintains only the recent data.

Where was ERP lacking?


Thousands

of relational database tables, designed and normalized for running the business operations were not at all suitable for providing strategic information. data repositories lacked data from external sources and from other operational systems in the company.

ERP

DSSs Inability to provide strategic information

IT receives too many ad hoc requests, resulting in a large overload. With limited resources, IT is unable to respond to the numerous requests in a timely fashion. Requests keep on changing all the time. The users require more reports to expand and understand the earlier reports. Users go into a Spiral of asking more, therefore increasing IT load. Users have to depend on IT to provide information. Not usercentric. IT unable to provide a flexible and conducive environment for strategic decision making.

Decision Support System

Operational System

Informational Systems

Operational vs DSS

The Evolution of Data Warehousing

Since

1970s,

organizations

gained

competitive

advantage through systems that automate business processes to offer more efficient and cost-effective services to the customer.

This resulted in accumulation of growing amounts of data in operational databases.

The Evolution of Data Warehousing

Organizations now focused on ways to use operational data to support decision-making, as a means of gaining competitive advantage. However, operational systems were never designed to support such business activities. Involved with day to day transactions only. Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions.

The Evolution of Data Warehousing


Organizations

need to turn their archives of data into a source of knowledge, so that a single integrated or consolidated view of the organizations data is presented to the user. data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.

Objectives of Todays Businesses

Access and combine data from a variety of data stores

Perform complex data analysis across these data stores

Create multidimensional views of data and its metadata

Easily summarize and roll up the information across subject areas and business dimensions

These objectives cannot be met easily

Data is scattered in many types of incompatible structures. Lack of documentation has prevented from integrating older legacy systems with newer systems Accurate and accessible metadata across multiple organizations is hard to get

A New Type of System Environment : DW


Data is designed for analytical tasks Data from multiple applications Easy to use and conductive to long interactive sessions by users Read-intensive data usage Direct interaction with the system by the users without IT assistance Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results online Ability for users to initiate reports

Four Levels of Analytical Processing

In modern organization, at least four levels of analytical processing should be supported by information systems First level: Consists of simple queries and reports against current and historical data Second level: Goes deeper and requires the ability to do what if processing across data store dimensions

Four Levels of Analytical Processing


Third level: Needs to step back and analyze what has previously occurred to bring about the current status of the data Fourth level: Analyzes what has happened in the past and what needs to be done in the future in order to bring some specific change

Business Intelligence at the DW

What is a Data Warehouse?


Data Warehousing is a decision support system. It extracts data from various source systems eg : ERP, CRM. It has historical data kept in a single uniform format.

So summarizing, A DW is : An ideal environment for data analysis and decision support. Flexible and interactive. 100% user-driven. Very responsive and conducive to the ask-answer-askagain pattern. Provides the ability to discover answers to complex, unpredictable questions.

Characteristics
1.

The new concept is not to generate fresh data, but to make use of the large volumes of existing data and to transform it into forms suitable for providing strategic information. It is an user-centric environment not a product. A computing environment where users can find strategic information. A central database that is loaded from multiple operational databases for the purpose of end-user access and decision support. A data warehouse differs from an operational system in that the data it contains is normally static and updated in a scheduled manner through massive loading procedures. A data warehouse is developed to accommodate random, ad hoc queries and to allow users to drill down to minute levels of detail.

2.

3.

4.

Concept of Data Warehousing


Take all the data from the operational systems. Where necessary, include relevant data from outside, such as industry benchmark indicators. Integrate all the data from the various sources. Remove inconsistencies and transform the data. Store the data in formats suitable for easy access for decision making. This simple concept, involves different functions : data extraction, loading the data, transformation, storage, providing user interfaces.

Blend of Technologies
Different technologies needed to support data warehousing functions.

Scenario 1
ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

Scenario 1 : ABC Pvt Ltd.


Mumbai

Delhi Sales per item type per branch for first quarter. Chennai Sales Manager

Banglore

Solution 1:ABC Pvt Ltd.


Extract sales information

from each database. Store the information in a common repository at a single site.

Solution 1:ABC Pvt Ltd.


Mumbai

Report Delhi Data Warehouse Chennai Query & Analysis tools Sales Manager

Banglore

Scenario 2
One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

Scenario 2 : One Stop Shopping


Data Entry Operator Report Wait Operational Database Management

Data Entry Operator

Solution 2
Extract data

needed for analysis from operational

database. Store it in a warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.

Solution 2
Data Entry Operator Report Transaction Operational database Extract data Data Warehouse

Manager

Data Entry Operator

Scenario 3
Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.

Solution 3
Improve

the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.

Solution 3
Expansion

sales Data Warehouse Query and Analysis tool time Improvement

President

Need for Data Warehousing


Industry has huge amount of operational data Knowledge worker wants to turn this data into useful information.

This information is used by them to support strategic decision making.

It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker can make correct decisions.

Need for Data Warehousing (contd..)


From

business perspective

it is latest marketing weapon helps to keep customers by learning more about their needs . valuable tool in todays competitive fast evolving world.

What is Data Warehouse??

Inmonss definition
A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of managements decision making process.

Subject-oriented
Data

warehouse is organized around subjects such as sales,product,customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.

Integration
Data

Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency.
RDBMS

Legacy System

Data Warehouse

Flat File

Data Processing Data Transformation

Integration
In

terms of data.

encoding structures. Measurement of


attributes.

physical attribute.
of data
remarks

naming conventions. Data type format

Time-variant
Provides

information from historical perspective e.g. past 5-10 years Every key structure contains either implicitly or explicitly an element of time

Nonvolatile
Data

once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Access of data

load

access

Operational v/s Information System


Features
Characteristics Orientation User Function Data View DB design Unit of work Access

Operational
Operational processing Transaction Clerk,DBA,database professional Day to day operation Current Detailed,flat relational Application oriented Read/write

Information
Informational processing Analysis Knowledge workers Decision support Historical Summarized, multidimensional Subject oriented Mostly read

Short ,simple transaction Complex query

Operational v/s Information System


Features
Focus Number of records accessed Number of users DB size Priority Metric

Operational
Data in tens thousands 100MB to GB

Information
Information out millions hundreds 100 GB to TB

High performance,high High flexibility,endavailability user autonomy Transaction throughput Query througput

Data Warehousing Architecture


Monitoring & Administration Metadata Repository OLAP Servers

Reconciled data
External Sources

Analysis

Extract Transform Load Refresh

Serve
Query/Reporting

Operational Dbs

Data Mining

DATA SOURCES
DATA MARTS

TOOLS

Data Warehouse Architecture


Data

Warehouse server almost always a relational DBMS,rarely flat files OLAP servers to support and operate on multi-dimensional data structures Clients Query and reporting tools Analysis tools Data mining tools

Data Warehouse Schema


Star

Schema Fact Constellation Schema Snowflake Schema

Star Schema
A

single,large and central fact table and one table for each dimension. Every fact points to one tuple in each of the dimensions and has additional attributes. Does not capture hierarchies directly.

Star Schema (contd..)


Store Dimension Store Key Store Name City State Region Fact Table Store Key Product Key Period Key Units Price Time Dimension Period Key Year Quarter Month

Product Key Product Desc Product Dimension

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.

SnowFlake Schema
Variant

of star schema model. A single,large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables

SnowFlake Schema (contd..)


Store Dimension Store Key Store Name City Key City Dimension City Key City State Region Product Key Product Desc Product Dimension Fact Table Store Key Product Key Period Key Units Price Time Dimension Period Key Year Quarter Month

Drawbacks: Time consuming joins,report generation slow

Fact Constellation
Multiple fact

tables share dimension tables. This schema is viewed as collection of stars hence called galaxy schema or fact constellation. Sophisticated application requires such schema.

Fact Constellation (contd..)


Sales Fact Table Store Key Product Key Period Key Units Price Store Dimension Store Key Store Name City State Region Product Dimension Product Key Product Desc Shipping Fact Table Shipper Key Store Key Product Key Period Key Units Price

Building Data Warehouse


Data

Selection Data Preprocessing Fill missing values Remove inconsistency Data Transformation & Integration Data Loading Data in warehouse is stored in form of fact tables and dimension tables.

Case Study
Afco

Foods & Beverages is a new company which produces dairy,bread and meat products with production unit located at Baroda. There products are sold in North,North West and Western region of India. They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda. The President of the company wants sales information.

Sales Information
Report: The number of units sold. 113

Report: The number of units sold over time

January 14

February 41

March 33

April 25

Sales Information
Report : The number of items sold for each product with time

Jan Feb Mar Apr Wheat Bread Cheese Swiss Rolls 6 8 16 25 6 6 21


Product

17 8

Sales Information
Report: The number of items sold in each City for each product with time

Jan Mumbai Wheat Bread Cheese Swiss Rolls Pune Wheat Bread Cheese Swiss Rolls 3 4 3 4

Feb Mar 3 16 16 6 6 3

Apr 10

7 8
Produ ct

15

Time

Sales Information
Report: The number of items sold and income in each region for each product with time. Jan Rs Mumbai Wheat Bread Cheese Swiss Rolls Pune Wheat Bread Cheese Swiss Rolls 7.95 7.32 3 4 16.47 9 27.45 15 7.95 7.32 3 4 42.40 29.98 U Feb Rs U Mar Rs 7.44 16 15.90 16 10.98 7.44 U 3 6 6 3 17.36 21.20 7 8 Apr Rs 24.80 U 10

Sales Measures & Dimensions


Measure

Units sold, Amount. Dimensions Product,Time,Region.

Sales Data Warehouse Model


Fact Table

City Mumbai Mumbai Pune Pune Mumbai

Product Cheese Cheese Swiss Rolls

Month January January February

Units 3 4 3 4 16

Rupees 7.95 7.32 7.95 7.32 42.40

Wheat Bread January Wheat Bread January

Sales Data Warehouse Model


City_ID Prod_ID 1 1 2 2 1 589 1218 589 1218 589 Month 1/1/1998 1/1/1998 1/1/1998 1/1/1998 2/1/1998 Units 3 4 3 4 16 Rupees 7.95 7.32 7.95 7.32 42.40

Sales Data Warehouse Model


Product Dimension Tables

Prod_ID 589 590 288

Product_Name Wheat Bread White Bread Coconut Cookies

Product_Category_ID 1 1 2

Product_Category_Id 1 2

Product_Category Bread Cookies

Sales Data Warehouse Model


Region Dimension Table

City_ID 1 2

City Mumbai Pune

Region West NorthWest

Country India India

Sales Data Warehouse Model


Time

Sales Fact

Product

Product Category

Region

Online Analysis Processing(OLAP)

It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
Produc t Data Warehouse

Time

OLAP Cube
City All Mumbai Mumbai Mumbai Mumbai Mumbai Product All All White Bread Time All All All Units 113 64 38 13 3 3 Dollars 251.26 146.07 98.49 32.24 7.44 7.44

Wheat Bread All Wheat Bread Qtr1 Wheat Bread March

OLAP Operations
Drill Down Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster

Time

OLAP Operations
Drill Up Product Category e.g Electrical Appliance Sub Category e.g Kitchen Product e.g Toaster

Time

OLAP Operations
Slice and Dice Product Product=Toaster

Time

Time

OLAP Operations
Pivot Product Product

Time

Region

Cube view of Data

OLAP Server
An

OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure. OLAP server available are
MOLAP server ROLAP server HOLAP server

Presentation
Product

Reporting Tool

Report Time

Data Warehousing includes


Build Data

Warehouse Online analysis processing(OLAP). Presentation.


Cleaning ,Selection & Integration RDBMS Presentation

Flat File

Warehouse & OLAP server

Client

Data Warehousing Tools


Data

Warehouse SQL Server 2000 DTS Oracle 8i Warehouse Builder OLAP tools SQL Server Analysis Services Oracle Express Server Reporting tools MS Excel Pivot Chart VB Applications

S-ar putea să vă placă și