Documente Academic
Documente Profesional
Documente Cultură
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
The boundary is blurring for active data warehousing. For an ATM, if update not in real-time, then lot of real trouble. DWH is for strategic decision making based on historical data, would not hurt if transactions of last one hour or day are absent. Rate of update depends on:
Volume of data Nature of business Cost of keeping historical data Benefit of keeping historical data
Data Warehousing - Fall 2010 2
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Decision makers typically dont work 24 hrs a day and 7 days a week. An ATM system does. Once decision makers start using the DWH, and start reaping the benefits, they start liking it. Start using the DWH more often, till want it available 100% of the time. For business across the globe, 50% of the world may be sleeping at any one time, but the businesses are up 100% of the time. 100% availability not a trivial task, need to take into account loading strategies, refresh rates etc.
Data Warehousing - Fall 2010 3
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Implement warehouse Integrate data Test for biasness / incorrectness Program w.r.t. data Design DSS system Analyze results Understand requirements
Data Warehousing - Fall 2010 4
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
OLAP (Online Analytical Processing) queries must be executed in a small number of seconds.
Complex query scripts and large list selections can generally be executed in a small number of minutes. Sophisticated clustering algorithms e.g. data mining can generally be executed in a small number of hours (even for hundreds of thousands of customers).
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
OLTP: Select tx_date, balance from tx_table where account_ID = 829; DWH
Select balance, age, sal, gender from customer_table and tx_table where age between (30 and 40) and education = graduate and custID.customer_table = customer_ID.tx_table;
OLTP
DWH
Primary key used No concept of primary index May use a single table Normally few rows returned High selectivity of query Indexing on primary key (unique)
Primary key NOT used Primary index used Mostly uses multiple tables Normally many rows returned Low selectivity of query Indexing on primary index (nonunique)
6
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
OLTP
Application specific Multiple databases with repetition Off the shelf application Runs the business Operational data No summary Fully normalized
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
OLTP
Operational processing Transaction Clerks, DBAs etc. Day to day operation Application oriented Short, simple transaction Read / write Data in High performance / availability Transaction throughput
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Clients (Tier 3)
Query/Reporting
www data
Meta Data
Archived data
IT Users
Data Warehouse
ROLAP
Tools
Analysis
Data sources
Data Marts
Business Users
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Data sources are unstructured & heterogeneous. Requirements are always changing. Most computer scientist trained on OLTP systems, those concepts not valid for VLDB & DSS. The scale factor in VLDB implementations is difficult to comprehend. Performance impacts are often non-linear O(n) vs. O(nlogn) e.g. scanning vs indexing. Complex computer/database architectures. Rapidly changing product characteristics.
FAST-NU, Islamabad
10
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Resource
FAST-NU, Islamabad
12
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
FAST-NU, Islamabad
13
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Dominated by sheer volume of data Many ways to accommodate call level detail:
Only a few months of call level detail. Storing lots of call level detail scattered over different storage media. Storing only selective call level detail etc. Unfortunately, for many kinds of processing, working at an aggregate level is simply not possible as finding patterns will be difficult.
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Insurance Data Warehouses are similar to other Data Warehouses BUT with few exceptions:
Store data that is very old and used for actuarial processing / analysis. Typical business may change dramatically over last 40-50 years, but not insurance. In retailing or telecom, there are few important dates but in the insurance environment there are many dates of many kinds.
FAST-NU, Islamabad
15
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Insurance Data Warehouses are similar to other Data Warehouses BUT with few exceptions (Contd.):
Long operational business cycles, in years. Processing time in months. Thus the operating speed is different. Transactions are not gathered and processed but are in kind of frozen. Thus a very unique approach of design & implementation.
FAST-NU, Islamabad
16
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Fraud Detection Profitability Analysis Direct Mail / Database Marketing Credit Risk Prediction Customer Retention Modeling Yield Management Inventory Management
ROI on any one of these applications can justify HW / SW and Consultancy costs in most organizations.
Data Warehousing - Fall 2010 17
FAST-NU, Islamabad
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Detection
By observing data usage patterns People have typical purchase patterns Deviation patterns Certain cities notorious for fraud Certain items bought by stolen cards Similar behavior for stolen cards
FAST-NU, Islamabad
18
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Analysis
Banks know if they are profitable are not Dont know which customers are profitable Typically more than 50% are NOT profitable Dont know which one? Balance is not enough, transactional behavior is the key Restructure products and pricing strategies Life time profitability models (next 3-5 years)
FAST-NU, Islamabad Data Warehousing - Fall 2010 19
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Mail Marketing
Targeted marketing Offering high bandwidth package NOT to all users Know from call detail records of web surfing Saves marketing expense, saving pennies Knowing your customer better
FAST-NU, Islamabad
20
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Risk Prediction
Who should get a loan? Customer segregation i.e. stable vs. rolling Qualitative decision making NOT subjective Different interest rates for different customers Do not subsidize bad customer on the basis of good
FAST-NU, Islamabad
21
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Management
Works for fixed inventory businesses The price of item suddenly goes to zero Item prices vary for varying customers Examples: Airlines, Hotels etc. E.g. Price of air ticket depends on:
How much in advance ticket was bought? How many vacant seats were available? How profitable is the customer? Ticket is one-way or return?
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
Systems
Agriculture related data collected for decades Metrological data consists of 50+ attributes Decision making based on expert judgment Lack of integration results in underutilization What is required, in which amount and when?
FAST-NU, Islamabad
23
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
service / insurance Retailing and distribution Telecommunications Transportation Government Common thread:
Lots of customers and transactions.
FAST-NU, Islamabad
24
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/
SAS etc. Interactive response times for online analysis but batch time is important as well.
FAST-NU, Islamabad Data Warehousing - Fall 2010 25
Created with Print2PDF. To remove this line, buy a license at: http://www.software602.com/