Sunteți pe pagina 1din 25

Data Warehousing

Naveed Iqbal, Assistant Professor FAST-NU, Islamabad


(Lecture Slides Week # 3)

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Data Warehouse: How is it Different?


4. Usually (but not always) periodic or batch updates rather than real-time

The boundary is blurring for active data warehousing. For an ATM, if update not in real-time, then lot of real trouble. DWH is for strategic decision making based on historical data, would not hurt if transactions of last one hour or day are absent. Rate of update depends on:

Volume of data Nature of business Cost of keeping historical data Benefit of keeping historical data
Data Warehousing - Fall 2010 2

FAST-NU, Islamabad

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Data Warehouse: How is it Different?


5. Starts with 6x12 availability requirement but 7x24 usually becomes the goal.

Decision makers typically dont work 24 hrs a day and 7 days a week. An ATM system does. Once decision makers start using the DWH, and start reaping the benefits, they start liking it. Start using the DWH more often, till want it available 100% of the time. For business across the globe, 50% of the world may be sleeping at any one time, but the businesses are up 100% of the time. 100% availability not a trivial task, need to take into account loading strategies, refresh rates etc.
Data Warehousing - Fall 2010 3

FAST-NU, Islamabad

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Data Warehouse: How is it Different?


6. Does not follows the traditional development model

Requirements gathering Analysis Design Programming Testing Integration Implementation


FAST-NU, Islamabad

Implement warehouse Integrate data Test for biasness / incorrectness Program w.r.t. data Design DSS system Analyze results Understand requirements
Data Warehousing - Fall 2010 4

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Data Warehouse: How is it Different?


7. Comparison of response times

OLAP (Online Analytical Processing) queries must be executed in a small number of seconds.

Often requires de-normalization and/or sampling

Complex query scripts and large list selections can generally be executed in a small number of minutes. Sophisticated clustering algorithms e.g. data mining can generally be executed in a small number of hours (even for hundreds of thousands of customers).

FAST-NU, Islamabad

Data Warehousing - Fall 2010

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Data Warehouse: How is it Different?


8. Data Warehouse vs. OLTP (Online Transaction Processing)

OLTP: Select tx_date, balance from tx_table where account_ID = 829; DWH
Select balance, age, sal, gender from customer_table and tx_table where age between (30 and 40) and education = graduate and custID.customer_table = customer_ID.tx_table;

OLTP

DWH

Primary key used No concept of primary index May use a single table Normally few rows returned High selectivity of query Indexing on primary key (unique)

Primary key NOT used Primary index used Mostly uses multiple tables Normally many rows returned Low selectivity of query Indexing on primary index (nonunique)
6

FAST-NU, Islamabad

Data Warehousing - Fall 2010

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH vs. OLTP: Summary


DWH
Scope
Application neutral Single source of truth Evolves over time How to improve business

OLTP
Application specific Multiple databases with repetition Off the shelf application Runs the business Operational data No summary Fully normalized

Data Perspective Queries Time Factor

Historical, detailed data Some summary Lightly de-normalized

Hardly uses PK No. of returned results in Ks

Based on PK No. of returned results in 100s

Minutes to hours Typical availability 12x6

Sub seconds to seconds Typical availability 24x7

FAST-NU, Islamabad

Data Warehousing - Fall 2010

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH vs. OLTP: Summary


DWH
Characteristics Orientation Users Function DB Design Unit of work Access Focus Priority Metric Informational processing Analysis Knowledge workers Decision support Subject oriented Complex query Mostly read Information out High flexibility / autonomy Query throughput

OLTP
Operational processing Transaction Clerks, DBAs etc. Day to day operation Application oriented Short, simple transaction Read / write Data in High performance / availability Transaction throughput

FAST-NU, Islamabad

Data Warehousing - Fall 2010

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Putting the pieces together


Data (Tier 0)
Semistructured Sources

Data Warehouse Server (Tier 1)

OLAP Servers (Tier 2)


MOLAP

Clients (Tier 3)
Query/Reporting

www data

Meta Data

Archived data

IT Users

Extract Transform Load (ETL)

Data Warehouse

ROLAP

Business Users Data Mining

Tools

Analysis

Operational Data Bases

Data sources

Data Marts

Business Users

FAST-NU, Islamabad

Data Warehousing - Fall 2010

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Why is this hard?


Data sources are unstructured & heterogeneous. Requirements are always changing. Most computer scientist trained on OLTP systems, those concepts not valid for VLDB & DSS. The scale factor in VLDB implementations is difficult to comprehend. Performance impacts are often non-linear O(n) vs. O(nlogn) e.g. scanning vs indexing. Complex computer/database architectures. Rapidly changing product characteristics.

FAST-NU, Islamabad

Data Warehousing - Fall 2010

10

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: High level implementation steps


Phase-I Determine user needs Determine DBMS Server platform Determine hardware platform(s) Information and Data Modeling Construct metadata repository Phase-II Data acquisition and cleansing Data transform, transport and populate Determine middleware connectivity Prototyping, querying and reporting Data Mining Online Analytical Processing (OLAP) Phase-III Deployment and System Management
FAST-NU, Islamabad Data Warehousing - Fall 2010 11

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Types of Data Warehouses


Financial Telecommunication Insurance Human Global Exploratory

Resource

FAST-NU, Islamabad

Data Warehousing - Fall 2010

12

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Types of Data Warehouses


Financial

First Data Warehouse that an organization builds. This is appealing because:


Nerve center, easy to get attention. In most organizations, smallest data set. Touches all aspects of an organization with a common denomination i.e. money. Inherent structure of data directly influenced by the day-to-day activities of financial processing.

FAST-NU, Islamabad

Data Warehousing - Fall 2010

13

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Types of Data Warehouses


Telecommunication

Dominated by sheer volume of data Many ways to accommodate call level detail:

Only a few months of call level detail. Storing lots of call level detail scattered over different storage media. Storing only selective call level detail etc. Unfortunately, for many kinds of processing, working at an aggregate level is simply not possible as finding patterns will be difficult.

FAST-NU, Islamabad Data Warehousing - Fall 2010 14

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Types of Data Warehouses


Insurance

Insurance Data Warehouses are similar to other Data Warehouses BUT with few exceptions:
Store data that is very old and used for actuarial processing / analysis. Typical business may change dramatically over last 40-50 years, but not insurance. In retailing or telecom, there are few important dates but in the insurance environment there are many dates of many kinds.

FAST-NU, Islamabad

Data Warehousing - Fall 2010

15

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

Types of Data Warehouses


Insurance

Insurance Data Warehouses are similar to other Data Warehouses BUT with few exceptions (Contd.):
Long operational business cycles, in years. Processing time in months. Thus the operating speed is different. Transactions are not gathered and processed but are in kind of frozen. Thus a very unique approach of design & implementation.

FAST-NU, Islamabad

Data Warehousing - Fall 2010

16

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Typical Applications

Impact on organizations core business is to streamline and maximize profitability.


Fraud Detection Profitability Analysis Direct Mail / Database Marketing Credit Risk Prediction Customer Retention Modeling Yield Management Inventory Management

ROI on any one of these applications can justify HW / SW and Consultancy costs in most organizations.
Data Warehousing - Fall 2010 17

FAST-NU, Islamabad

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Typical Applications


Fraud

Detection

By observing data usage patterns People have typical purchase patterns Deviation patterns Certain cities notorious for fraud Certain items bought by stolen cards Similar behavior for stolen cards

FAST-NU, Islamabad

Data Warehousing - Fall 2010

18

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Typical Applications


Profitability

Analysis

Banks know if they are profitable are not Dont know which customers are profitable Typically more than 50% are NOT profitable Dont know which one? Balance is not enough, transactional behavior is the key Restructure products and pricing strategies Life time profitability models (next 3-5 years)
FAST-NU, Islamabad Data Warehousing - Fall 2010 19

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Typical Applications


Direct

Mail Marketing

Targeted marketing Offering high bandwidth package NOT to all users Know from call detail records of web surfing Saves marketing expense, saving pennies Knowing your customer better

FAST-NU, Islamabad

Data Warehousing - Fall 2010

20

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Typical Applications


Credit

Risk Prediction

Who should get a loan? Customer segregation i.e. stable vs. rolling Qualitative decision making NOT subjective Different interest rates for different customers Do not subsidize bad customer on the basis of good

FAST-NU, Islamabad

Data Warehousing - Fall 2010

21

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Typical Applications


Yield

Management

Works for fixed inventory businesses The price of item suddenly goes to zero Item prices vary for varying customers Examples: Airlines, Hotels etc. E.g. Price of air ticket depends on:
How much in advance ticket was bought? How many vacant seats were available? How profitable is the customer? Ticket is one-way or return?

FAST-NU, Islamabad Data Warehousing - Fall 2010 22

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Recent Applications


Agriculture

Systems

Agriculture related data collected for decades Metrological data consists of 50+ attributes Decision making based on expert judgment Lack of integration results in underutilization What is required, in which amount and when?

FAST-NU, Islamabad

Data Warehousing - Fall 2010

23

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: Typical Early Adopters


Financial

service / insurance Retailing and distribution Telecommunications Transportation Government Common thread:
Lots of customers and transactions.

FAST-NU, Islamabad

Data Warehousing - Fall 2010

24

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

DWH: End User Expectations


Point

and click access to data Insulation from DBMS structures


Want semantic data model not 3rd normal form
Integration

with existing tools: Excel,

SAS etc. Interactive response times for online analysis but batch time is important as well.
FAST-NU, Islamabad Data Warehousing - Fall 2010 25

Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/

S-ar putea să vă placă și