Sunteți pe pagina 1din 31

CHAPTER 4

THE CRM DATA WAREHOUSE

WHAT IS A DATA WAREHOUSE?


A

large reservoir of detailed and summary data that describes the organization and its activities, organized by the various business dimensions in a way to facilitate easy retrieval of information describing activities data mart a subset of the data warehouse, tailored to meet the specialized needs of a particular group of users Top-down approach bottom-up approach to data warehouse developmentthe data marts are created first and then integrated.

Data Warehousing Objectives


(1) (2) (3) (4) (5)

keep the warehouse data current; ensure that the warehouse data is accurate; keep the warehouse data secure; make the warehouse data easily available to authorized users; maintain descriptions of the warehouse data so that users and system developers can understand the meaning of each element

Data Warehouse vs. DBMS

OLTP (on-line transaction processing)


Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. Major task of data warehouse system Data analysis and decision making User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries

OLAP (on-line analytical processing)


Distinct features (OLTP vs. OLAP):


OLTP u se r s fu n c tio n D B d e sig n d a ta u sa g e a c c e ss u n it o f w o r k # u se r s D B size m e r ic t

OLAP

c le rk , IT p ro fessio n a l k n o w le d g e w o rk e r d a y to d a y o p e ra tio n s d e cisio n su p p o rt a p p lic a tio nrie n te d -o su b jec t rie n te d -o c u rre n t, -to-d a te up h isto ric a l, d e ta ile d , fla t re la tio n a l su m m a riz e d , m u ltid im a ln sio n e iso la te d in te g ra te d , c o n so lid a te d re p e titiv e a d-h o c re a d /w rite lo ts o f sc a n s in d e x /h a sh o n p rim . k e y sh o rt, sim p le tra n sa c tiocno m p le x q u e ry m illio n s h u n d re d s 1 0 0 G -T B B th o u sa n d s 1 0 0 M -G B B

# r e c o r d s a c c e sse te n s d

tra n sa c tio n th ro u g h p u t q u e ry th ro u g h p u t, re sp o n se

DATA WAREHOUSE ARCHITECTURE


staging

area data is prepared to be moved into the warehouse data repository and the metadata repository metadata data about data, or descriptions of the data in the data warehouse Exhibit 4.1: A Data Warehouse System Model

EXHIBIT 4.1 A DATA WAREHOUSE SYSTEM MODEL


Data Warehouse System Data gathering system Warehouse data repository Information Delivery system

Staging area

Legend : Data flow Control flow Management and control

Metadata repository

A Data Warehouse System Model


Management and Control management and control componentlike a traffic officer standing in the middle of a street intersection, controlling the flow of traffic through the intersection Staging Area ETL extraction, transformation, and loading as the activities of this staging area extraction obtaining data from the internal databases and files of systems, accomplished according to a schedule transformation a process that includes cleaning, standardizing, reformatting, and summarizing loading writing the data into the data warehouse

A Data Warehouse System Model

WAREHOUSE DATA REPOSITORY

where the warehouse data is stored within the computer system or systems customer picturea compilation of geographic, demographic, activity, psychographic, and behavioral data the types of data to be processed, including considerations of data granularity, data hierarchies, and data dimensions Data Types

Data Content

Data Characteristics

fixed-length format variable-length format

A Data Warehouse System Model

Data Granularity the degree of detail that is represented by the data, where the greater the detail, the finer the granularity Data Hierarchies since multiple attributes can describe a single entity, an attribute is a data element that identifies or describes an occurrence of a data entity (i.e., a particular customer is identified by a customer number attribute)

Exhibit 4.2: An Example of a Data Hierarchy

Data Dimensions
for example, a manager can query the data warehouse for a display of data according to salesperson, customer, product, and time Exhibit 4.3: Every Data Record Contains the Time Element

EXHIBIT 4.2 AN EXAMPLE OF A DATA HIERARCHY

Customer Customer number Customer age Customer gender Customer marital status Customer number of dependents Customer education Customer dwelling type Customer state Customer city Customer zip code

EXHIBIT 4.3 EVERY DATA RECORD CONTAINS THE TIME ELEMENT

Customer sales order Sales order date Customer statement Statement date Warehouse shipping order Date shipped

Customer invoice Invoice date

Customer payment Customer Payment date

A Data Warehouse System Model

METADATA REPOSITORY describes the flow of data from the time that the data is captured until it is archived, i.e., metadata in the metadata repository for the customer number attribute would describe its format, editing rules, and so on TYPES OF METADATA Metadata for Users (analysis) identification of the source systems, the time of the last update, the different report formats that are available, and how to find data in the data warehouse Metadata for Systems Developers data to allow developers to maintain, revise, and reengineer the data warehouse system, including the various rules that were employed in creating the warehouse data repository, and the rules for extraction, cleansing, transforming, purging, and archiving

A Data Warehouse System Model


Data

and Process Models

object diagrams and entity-relationship diagrams use cases, use case diagrams, and data flow diagrams

CASE

Tools Systems

stands for computer-aided system engineering and is a way to use the computer to develop systems include a data dictionary component, which contains excellent descriptions of the data in the database or data warehouse.

DBMS

HOW DATA IS STORED IN THE DATA WAREHOUSE


dimension

table a list of all of the attributes that identify and describe a particular entity Exhibit 4.4: A Sample Dimension Table fact table a list of all the facts that relate to some type of the organizations activity Exhibit 4.5: A Sample Fact Table

EXHIBIT 4.4 A SAMPLE DIMENSION TABLE

Customer Customer number Customer name Customer phone number Customer e-mail address Customer territory Customer credit code Customer standard industry code Customer city Customer state Customer zip code

EXHIBIT 4.5 A SAMPLE FACT TABLE

Commercial Sales Facts Actual sales units Budgeted sales units Actual sales amount Budgeted sales amount Sales discount amount Net sales amount Sales commission amount Sales bonus amount Sales tax amount

INFORMATION PACKAGES
a

table that is maintained in the data warehouse repository that identifies both the dimensions and the facts that relate to a business activity Exhibit 4.6: Information Package Format keya number, such as a customer number, that identifies a particular occurrence of the dimension Exhibit 4.7: A Sample Information Package

EXHIBIT 4.6 INFORMATION PACKAGE FORMAT

Subject : Name of business activity being measured


Dimension Name Dimension Key Dimension 1 Dimension 2 Dimension 3 Dimension 4 Dimension Name Dimension Key Dimension 1 Dimension 2 Dimension 3 Dimension n Dimension Name Dimension Key Dimension 1 Dimension 2 Dimension n Dimension Name Dimension Key Dimension 1 Dimension 2 Dimension 3 Dimension 4

Dimension n

Dimension n

Facts : Numberic measures of the business activity

EXHIBIT 4.7 A SAMPLE INFORMATION PACKAGE

Subject : Commercial sales


Time Time Key Hour Day Month Quarter Year Salesperson Salesperson key Sales branch Sales region Subsidiary Customer Customer key Customer territory Product Product key Product name Product model Product line

Salesperson name Customer name

Customer credit code Product brand

Facts : Actual sales units, budgeted sales units, actual sales amount, budgeted sales amount, sales discount amount, net sales amount, sales commission amount, sales bonus amount, sales tax amount

STAR SCHEMAS

the arrangement of an information package that usually identifies multiple dimension tables for a single fact table and has the appearance of a star, with the fact table in the center and the dimension tables forming the points Exhibit 4.8: Star Schema Format foreign keys a means of linking the fact table to the dimension tables by means of the keys identified at the top of the fact table where the keys identify other, foreign tables as opposed to the fact table Exhibit 4.9: A Sample Star Schema

EXHIBIT 4.8 STAR SCHEMA FORMAT

Dimension 1 name Dimension 1 key Dimension 1 hierarchy Business activity name Dimension 1 key Dimension 2 key Dimension n key Measurable fact 2 Measurable fact 4 Measurable fact 5 Measurable fact n Dimension n name Dimension n key Dimension n hierarchy

Dimension 2 name Dimension 2 key Dimension 2 hierarchy

EXHIBIT 4.9 A SAMPLE STAR SCHEMA


Customer Customer key Customer name Customer type Customer credit code Salesperson number Sales territory Standard industry code Salesperson Salesperson key Salesperson name Sales region Sales branch Product sales facts Product key Customer key Salesperson key Time key Sales units Gross sales amount Sales discount amount Net sales amount Sales commission amount Time Time key Day Month Quarter Year

Customer payment Product key Product name Product unit price Product quantity

Example of Star Schema


time
time_key day day_of_the_week month quarter year

item
Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures
item_key item_name brand type supplier_type

branch
branch_key branch_name branch_type

location
location_key street city province_or_street country

Example of Snowflake Schema


time
time_key day day_of_the_week month quarter year

item
Sales Fact Table time_key item_key branch_key
item_key item_name brand type supplier_key

supplier

supplier_key supplier_type

branch
branch_key branch_name branch_type

location_key units_sold dollars_sold avg_sales Measures

location
location_key street city_key

city

city_key city province_or_street country

time

Example of Fact Constellation


item
Sales Fact Table time_key item_key branch_key
item_key item_name brand type supplier_type

time_key day day_of_the_week month quarter year

Shipping Fact Table time_key item_key shipper_key from_location

branch
branch_key branch_name branch_type

location_key units_sold dollars_sold avg_sales Measures

location
location_key street city province_or_street country

to_location dollars_cost units_shipped shipper


shipper_key shipper_name location_key shipper_type

DATA WAREHOUSE NAVIGATION


summary information preprocessed data that provides the user with exactly the content that is needed top-down navigation the user seeks more detail in an effort to understand the summary information roll up navigation the user summarizes data to see the forest rather than the trees or to prepare summary graphs drill across navigation the user moves from one data hierarchy to another, i.e., information on customer sales, salesperson sales, and then product sales Exhibit 4.10: Navigation Paths

EXHIBIT 4.10 NAVIGATION PATHS


Summary information (Net sales for the Western sales region) Roll up Hierarchy 1 (customer) Hierarchy 2 (salesperson) Hierarchy n (product)

Drill across Drill down

Detailed information (Net sales for salesperson 3742) Drill through Detailed data (Sales units for salesperson 3742)

DATA WAREHOUSE SECURITY


information systems security damage, destruction, theft, and misuse Exhibit 4.11: The Security Action Cycle The Corporate Security Environment deterrence security policies and procedures that are intended to deter security violations, such as guidelines for proper system use and the requirement that users change their passwords periodically prevention measures aimed at those persons who ignore deterrence, and include such things as locks on computer rooms, user passwords, file permissions detection proactive actions include system audits, reports of suspicious activity, and virus scanning software and reactive actions take the form of investigations remedies respond with warnings, reprimands, termination of employment, or legal action.

EXHIBIT 4.11 THE SECURITY ACTION CYCLE


Deterred abuse

1. Deterrence

Maximize Deterred abuse

Deterrence feedback

2. Prevention

Prevented abuse

Maximize Prevented abuse

3. Detection

Undetected abuse

Maximize Undetected abuse

4. Remedies

Unpunished abuse

Maximize Unpunished abuse

DATA WAREHOUSE SECURITY


Data Warehouse Security Measures network security using procedures such as firewalls to restrict access to the network that houses the servers and data files, databases, data warehouses, and data marts data security obtaining access to data once access to the network has been achieved; where, data files may be located on multiple servers on the network, and the user must provide a second password database or data warehouse security the security checks of the database management system (DBMS) that may include a third password, verification of user name, and also verification of access to particular data tables, records, and even record fields

S-ar putea să vă placă și