Sunteți pe pagina 1din 33

Lecture 2 Dr. N.P.

Singh Professor

It is an offshoot of the desire to use data on the Internet and the desire to analyze click stream data. Another is compliance. Regulatory authorities are increasingly demanding the retention of more and more data for longer and longer periods. A third is the move towards so-called active data warehousing. Fourth Move: Wal-Mart, for example, already has a data warehouse that contains hundreds of terabytes, yet it estimates that its implementation of RFID (radio frequency identification), once fully rolled out, will generate an additional 7TB of data per day.

System infrastructure: Hardware, software, network, database management system, and personnel components of the infrastructure. Metadata layer: Data about data. This includes, but is not limited to, definitions and descriptions of data items and business rules. Data discovery: The process of understanding the current environment so it can be integrated into the warehouse. Data acquisition: The process of loading data from the various sources.. Data distribution: The dissemination/replication of data to distributed data marts for specific segmented groups. User analysis: Includes the infrastructure required to support user queries and analysis.

It may be as narrow as a personal data warehouse for a single manager for a single year. It may be functional, departmental or divisional data warehouse. It may be enterprise data warehouse it is more expensive and time consuming

There are three reasons to have data redundancy in data warehouse environment "Virtual" or "Point-to-Point" Data Warehouses Central Data Warehouses Distributed Data Warehouses

Executives and managers "Power" users (business and financial analysts, engineers, etc.) Support users (clerical, administrative, etc.)

The customer manufactures more than 500 products in 2 locations, and sells its products from 17 locations to 35 regions across the country. Field sales teams organized along lines of business and territories support sales.

The need for regular and real-time information to make decisions in a complex and dynamic environment The company had stringent reporting requirements With relevant data residing in disparate systems, creating consolidated reports for senior management was tedious and time consuming Senior managers lacked quick and easy access to market and competitor data

Reporting requirements of the following functional areas were considered:


Sales Marketing Finance MIS Distribution

Monthly trend analysis of market share and market rank of the customer as compared to their competitors Therapeutic category-wise market share and market rank of individual products of the customer as compared to their competitor products Top 10 pharmaceuticals by Evolution Index Top 100 customers by sales value Analysis of salvage net across various dimensions like location, time, and product Since a geographical territory was assigned to a Medical Sales Representative (MSR), this data warehouse also catered to performance monitoring of the MSRs Weeks of inventory analysis

MS SQL Server for the data warehouse Data Transformation Services for handling extraction, transformation and loading BI Portal for presentation MS OLAP for analysis cubes Reporting Services for developing reports

A data warehousing system should provide a complete solution for managing the flow of information from existing corporate databases and external sources into end-user decision support system. It should make it easy for business users to find out what information exists in the warehouse, and provide tools for accessing and manipulating that information.

Design component Data acquisition component Data manager component Management component Information directory component Data access component Middleware component Data delivery component

E-R Model
ORDER 1 CAN HAVE ORDER: #, DATE, PART #, QUANTITY

1
PART M CAN HAVE 1 SUPPLIER SUPPLIER: #, NAME, ADDRESS PART: #, DESCRIPTION, UNIT PRICE, SUPPLIER #

What it is? Process of creating small data structures from complex groups of data. Step by step reversible process of replacing a collection of relations with an equivalent collection of relations, which has fewer update anomalies. Preserve dependencies. The most important point is Which has fewer update anomalies & Preserve dependencies.

Emp-Num 211306801

Emp_Name Arnold Jim

Store Branch Downtown

Department Hardware

Item_No TR101

Item_Name Router

Sales_Price ($) 35.00

SA 10
PT 65 AB 165 301421011 Znud Bill Dadeland Home Appliances TT 14 DS 104 419846204 Belohlov, Jim Culter Auto Parts MC 164

Saw
Drill Lawnmover Humidifier Diswasher Snow Tie

19.00
21.00 245 114.00 262.00 85.00

AC1462
BB1000 61247216 Boynton Tom Fashion square Mens Clothing HS101

Alternator
Battery 3-Pc Suit

65.00
49.50 215.00

Emp-Num 211306801 211306801 211306801 211306801 301421011 301421011 419846204 419846204 419846204 61247216

Emp_Name Arnold Jim Arnold Jim Arnold Jim Arnold Jim Znud Bill Znud Bill Belohlov, Jim Belohlov, Jim Belohlov, Jim Boynton Tom

Store Branch Downtown Downtown Downtown Downtown Dadeland Dadeland Culter Culter Culter Fashion square

Department Hardware Hardware Hardware Hardware Home Appliances Home Appliances Auto Parts Auto Parts Auto Parts Mens Clothing

Item_No TR101 SA 10 PT 65 AB 165 TT 14 DS 104 MC 164 AC1462 BB1000 HS101

Item_Name Router Saw Drill Lawnmover Humidifier Diswasher Snow Tie Alternator Battery 3-Pc Suit

Sales_Price ($) 35.00 19.00 21.00 245 114.00 262.00 85.00 65.00 49.50 215.00

New

ER modeling is a logical design technique that seeks to eliminate data redundancies. ER modeling is a discipline used to illuminate the microscopic relationships among data elements. It is very helpful to transaction processing because it makes transactions very simple & deterministic. ER model for the enterprise has 100s of logical entities (Fig on next slide- ER model of an enterprise that manufactures products, sells products to chain retailer, & measures the sales of retailers). ERP type of systems are having 1000s entities & each is converted to a physical table

End Users can not understand or remember an ER model. Navigation of ER model is not possible. NO GUI interface for its easy use. Software can not usually query a general ER model. Use of ER model defeats the purpose of DW i.e., high performance retrieval of the data.

Dimension modeling is a logical design technique often used for data warehouse. Dimension modeling is the only viable techniques for delivering data to end users in a data warehouse. Every Dim Model is composed of one table with a multipart key called fact table & set of smaller tables called dimension tables. Each dimension table has a single part PK that is corresponding to one part of multipart key. (Fig2) Structure is called star join Multipart PK of FT made up of two or more foreign keys always express a many-many relationship. The most useful facts in a FT are numeric & additive (more than one record is fetched) DM consists textual information. Dimension attributes are the source of most of the interesting constraints in DW queries& always row headers in SQL answer set. Lemon flavored products via the flavor attribute in the product table & radio promotion via.. Dim T are entry points in to DW.

Time_key (PK) SQL_date Day_of_week Week_number Month Etc.

Store_key (PK) Store_ID Store_name Address Distinct Floor type etc

Time_key (FK) Product_key (FK) Store_key (FK) Customer_ Key (FK) Clerk_key (FK)

Product_key (PK) SKU Description Brand Category Package_type Size Flavor etc Customer_key (PK) Customer_name Purchase_profile Credit_profile Demographic_type Address Etc.

Clerk_key(PK) Clerk_ID Clerk_name Clerk_grade Etc.

Dollar_sold Unit_sold Dollars_cost

A Dim Model isolating the retail sales process from the fig 1. The facts & dimension can be found in ER model

Promotion_key (PK) Promotion_name Price_type Ad_type Display_type etc

Less information in Dim model The master E-R models may have sales calls, order entry, shipment invoices, customer payments, & product returns all on the same diagram. Multiple processes that never coexist in a single data set at a single consistent point in time are shown on one diagram. Making it over complex.

Step1:
Separate the ER diagram in to its discrete business processes and

model each of them separately.

Step 2:
Select many-to-many relationships in the ER model containing

numeric & additive non-key facts, and designate them as fact tables.

Step 3:
Demoralize all the remaining tables in to flat tables with single part

keys that connect directly to FT. Dim Tables. If same dimension table connect to two fact tables than it will be part of both schema. It is referred as confirmed between the two Dim Schema.

Dim Model is a predictable, standard framework, report writers, query tools, and user interfaces can all make strong assumptions about the dimensional model to make user interfaces more understandable, and make processing more efficient. Dim model with stands unexpected changes in user behavior. Every dimension is equal. Dimension models can be changed gracefully
Adding new unanticipated facts, as long as they are consistent with the

All of the available aggregate navigation software packages and utilities depend on a very specific single structure of fact & Dim tables that are absolutely dependent on the dimensional model.

fundamental grain of the existing fact table. Adding completely new dimensions Adding new attributes in dimension tables Breaking existing dim records to a lower level of granularity from a certain point in time forward.

Star Schema Snowflake Constellation

S-ar putea să vă placă și