Session 4 Case Study Retail Case

Retail Sales
Overview
Four-step dimensional design process

Transaction-level fact tables
Additive and non-additive facts
Sample dimension table attributes
Causal dimensions
Degenerate dimensions
Extending an existing dimension model
Snowflaking dimension attributes
Avoiding the too many dimensions trap
Surrogate keys
Four-Step Dimensional Design Process

1.
Select the business process to model.
2.
not business department or function

E.g., purchasing, ordering, shipping,
invoicing, inventorying
Declare the grain of the business

process.
Specifies individual fact table row

E.g., individual line item on sales ticket, daily
snapshot of the inventory levels for a product
Four-Step Dimensional Design Process

3.
Choose the dimensions that apply for each fact

table row.
4.
Identify the numeric (measured) facts that will

populate each fact table row.
Q: How do business people describe the data that

results from the business process?
E.g., date, product, store, customer, transaction
type
Q: What are we measuring?
Typical facts are numeric additive figures
E.g., quantity ordered, dollar cost amount
In making decisions regarding the 4 steps,

consider both the user requirements as well as
the realities of the source data
Retail Case Study
Large grocery chain: 100 grocery stores over

10regions
Each store:
Departments: grocery, frozen foods, dairy, meat,

produce, bakery, floral, health/beauty aids, etc.
60,000 products (SKUs = stock keeping units) on
shelves
55,000 SKUs with UPCs
5,000 SKUs without UPCs but with assigned SKU
numbers
Data is collected:
from cash registers into a point-of-sale (POS)

system
at back door where vendors make deliveries
Retail Case Study Contd
Management concerns
Logistics of ordering, stocking, and selling

products
Maximizing profit
Product pricing
Lowering cost of acquisition and overhead
Use of promotions to increase sales
temporary price reductions
newspaper ads
grocery store displays
coupons
Step 1. Select the Business Process
Decide what business process to model, by

combining an understanding of the business
requirements with an understanding of data
realities.
The first dimensional model built should be the
one
with the most impact,

that answers the most pressing business questions,
is readily accessible for data extraction.
In retail case study: POS retail sales

Business Question: What products are selling in
which stores on what days and under what
promotional conditions?
Step 2. Declare the Grain
What level of data detail should be made

available in the dimensional model?
Choose the most atomic information
captured by the business process.
Atomic data
Most detailed, cannot be subdivided
Facilitates ad hoc, unexpected usage and
ability to drill down to details
Case study grain: individual line item on

a POS transaction
Step 3. Choose the Dimensions
A careful grain statement determines the

primary dimensions.
It is then usually possible to add
additional dimensions.
If an additional desired dimension violates
the grain by causing additional fact rows
to be generated, then the grain statement
must be revised to accommodate this
dimension.
Case study dimensions: date, product,
store, promotion
Preliminary Retail Sales Schema
POS Sales Transaction Fact
Product Dimension
Promotion Key (PK)

Promotion attributes TBD
Date Dimension
Product Key (PK)

Product attributes TBD
Promotion Dimension
Date Key (FK)

Product Key (FK)
Store Key (FK)
Promotion Key (FK)
POS Transaction Number
Other facts TBD
Date Key (PK)

Date attributes TBD
Store Dimension
Store Key (PK)

Store attributes TBD
Step 4. Identify the Facts
Picking the business measurements for the fact

table: true to the grain.
Case study - Facts collected by POS system:
Sales quantity, sales price/unit, sales $ amount,

standard cost $ amount
Gross Profit = cost sales
Recommendation: Include in fact table even

though it can be calculated. Eliminates the
possibility of user error.
For non-additive measurements such as

percentages and ratios (e.g., gross margin) store
the numerator (gross profit) and denominator ($
revenue) in the fact table. The ratio can be
calculated in a data access tool for any slice of the
fact table. Caution: Calculate the ratio of the
sums, not the sum of the ratios
Date Dimension
Ubiquitous in every data mart

Use verbose, self-explanatory values rather than
coded values. They are used as column headers
in reports. By decoding in the database, we
ensure consistency across different application
environments.
E.g., Holiday Indicator use values: Holiday,

Nonholiday; as opposed to Y/N
Date Key should be an integer rather than a date

data type
Data warehouses need an explicit date dimension
table to describe fiscal periods, seasons, holidays,
weekends, and other calendar calculations that
are not supported by the SQL date function.
If transaction time is of interest, we may need a
separate Time Dimension table
Product Dimension
Describes every SKU in the store

Fill this dimension with as many descriptive
attributes as possible.
Robust dimension attributes deliver robust
analytic slicing and dicing capabilities.
Hierarchies = groups of attributes
Merchandise hierarchy
SKUs roll up to brands to categories to

departments.
Each is a many-to-one relationship
Although there will be redundancy, no need to

normalize. Given the relative size of the
dimension (as compared to the fact table) space
saving is minimal.
Store Dimension
The store dimension: Store Key

(PK), Store Name, Store Number
(Natural Key), Store Address,
Possible to represent multiple
hierarchies in a dimension table
Store to any geographic attribute (e.g.,

ZIP, county, state)
Store to store district to region
Promotion Dimension
Describes the promotion conditions under which a

product is sold
Called a causal dimension describes factors
thought to cause a change in product sales (price
reductions, ads, displays, coupons)
Could keep all 4 causal mechanisms in a single
dimension
or split into 4 separate dimensions
They are highly correlated, so not much difference in

space requirements
More efficient browsing for finding out how various
promotions are used together
May be more understandable to business
Administration may be more straightforward
To avoid null keys in the fact table (violation of

referential integrity), for line items not being
promoted include a row in the promotion dimension
to indicate No Promotion in Effect
Factless Fact Table
Q: Which products were under promotion but did

not sell?
Cannot answer yet. POS sales fact table has only
products that were sold
Answer: Create Promotion Coverage Factless
Fact Table
Factless Fact Table = has no measurement metrics

Contains date, product, store, and promotion keys
Two-step process to answer Q:
Query Promotion Coverage table: products under

promotion on given date
From POS Sales Fact table: products sold
Answer is the set difference of above
Degenerate Dimension (DD)
Dimension keys used in fact table without

corresponding dimension tables
In case study: POS Transaction #
Still useful for grouping by transaction
Common DDs: order numbers, invoice
numbers
Fact table primary key: Product Key and
POS Transaction Number
Retail Schema Extensibility
Original schema extends gracefully

because POS transaction data was
modeled at its most granular level.
Premature aggregation limits ability to
extend if new dimensions do not apply to
higher grain
Case study new dimensions:
Frequent Shopper
Clerk
Time of Day
Schema Extensibility
Dimensional models can handle extensions without

invalidating existing applications:
New dimension attributes simply add columns
to dimension table. If new attribute is only available
after point in time, populate old dimension records
with something like Not Available
New dimensions add foreign field keys to fact
table
New measured facts add to fact table. If not at
the same grain, then need separate fact table
Dimension becoming more granular create
new dimension. May imply more granular fact table,
in which case, may have to rebuild the fact table.
Addition of a completely new data source
involving existing and new dimensions usually
needs new fact table
Resisting Dimension Normalization
Snowflaking = Dimension table normalization
Redundant attributes are removed from the denormalized

dimension table and are placed in normalized secondary
dimension tables
Fully snowflaked schema = 3NF ER diagram
The dimension tables must not be normalized, and should

remain as flat tables.
Numerous tables and joins usually translate into slower
query performance.
Efforts to normalize any of the tables in a dimensional
database solely in order to save disk space are a waste of
time. Disk space savings gained by normalizing the
dimension tables are typically less than one percent of
the total disk space needed for the overall schema.
Normalized dimension tables destroy the ability to browse
within a dimension or across dimensions (e.g., list
package types for each brand in a category). SQL needed
becomes too complex.
The fact table is naturally normalized.
10
Too Many Dimensions
Too many dimensions increase space

requirements for the fact table.
A very large number of dimensions
typically means that several dimensions
are not completely independent and
should be combined.
A single hierarchy should not be captured
in separate dimensions.
Surrogate Keys
Surrogate keys are integers assigned sequentially as

needed to populate a dimension. They serve to join
dimension tables to the fact table.
Avoid embedding intelligence in the data warehouse
keys.
Benefits:
Surrogate keys buffer the DW environment from

operational changes. What happens when operations
decide to recycle account numbers after some period of
inactivity? Fine for operational systems, but problematic
for DW if it is using account numbers as a PK.
Can more easily integrate data from multiple operational
systems, even if they lack consistent source keys.
Performance advantages because small size of surrogate
keys leads to smaller fact tables
Surrogate keys are used to support one of the primary
techniques for handling changes in dimension table
attributes.
11
Inventory
Overview
Value chain implications

Inventory periodic snapshot model,
transaction and accumulating
snapshot models
Semi-additive facts
Enhanced inventory facts
Data Warehouse bus architecture
and matrix
Conformed dimensions and facts
12
Value Chain
The value chain identifies the natural,

logical flow of an organizations primary
activities.
Operational source systems produce
transactions or snapshots at each step in
the value chain. They generate
interesting performance metrics along the
way.
Each business process generates one or
more fact tables.
Inventory Models
Inventory periodic snapshot
Inventory transactions
Inventory level of each product measured daily

(or weekly) represented as a separate row in
a fact table
As products move through the warehouse, all
transactions with impact on inventory levels
are recorded
Inventory accumulating snapshot
One fact table row for each product updated as

the product moves through the warehouse
13
Inventory Periodic Snapshot Model
Business need
Analysis of daily quantity-on-hand inventory

levels by product and store
Business process
Granularity
Dimensions
Retail store inventory

Daily inventory by product at each store
Date, product, store
Fact
Quantity on hand
Inventory Periodic Snapshot Model Challenge
Very dense (huge) fact table
As opposed to retail sales, which was sparse

because only about 10% of products sell each day
60,000 items in 100 stores = 6,000,000 rows

If 14 bytes per row: 84MB per day
One-year period: 365 x 84MB = 30GB
Solution: Reduce snapshot frequencies over time
Last 60 days at daily level

Weekly snapshots for historical data
For a 3-year period =208 snapshots vs.
3x365=1095 snapshots; reduction by a factor of 5
14
Semiadditive Facts
Inventory levels (quantity on hand) are
additive across products or stores, but
NOT across dates = semi-additive facts
Compare to Retail Sales:
once the product is sold it is not counted again
Static level measurements (inventory,

balances) are not additive across date
dimension; to aggregate over time use
average over number of time periods.
Enhanced Inventory Facts
Number of turns = total quantity sold / daily average

quantity on hand
Days supply = final quantity on hand / average quantity
sold
Gross profit = value at latest selling price - value at cost
Gross margin = gross profit / value at latest selling price
GMROI (Gross Margin Return On Inventory)
Need additional facts:
GMROI = number of turns * gross margin

measures effectiveness of inventory investment
high = lot of turns and more profit, low = low turns and low
profit
quantity sold, value at cost, value at latest selling price
GMROI is not additive and, therefore, is not stored in

enhanced fact table. It is calculated from the constituent
columns.
15
Inventory Transactions Model
Record every transaction that affects inventory
Receive product
Place product into inspection hold
Release product from inspection hold
Return product to vendor due to inspection failure
Place product in bin
Authorize product for sale
Pick product from bin
Package product for shipment
Ship product to customer
Receive product from customer
Return product to inventory from customer return
Remove product from inventory
Inventory Transactions Model - Cont
Dimensions: date, warehouse, product, vendor,

inventory transaction type.
The transaction-level fact table contains the most
detailed information possible about the inventory.
It is useful for measuring the frequency and
timing of specific transaction types.
It is impractical for broad data warehouse
questions that span dates or products.
To give a more cumulative view of a process,
some form of snapshot table often accompanies a
transaction fact table.
16
Inventory Accumulating Snapshot

Model
Build one record in the fact table for each product delivery to
the warehouse
Track disposition of a product until it leaves the warehouse
Receiving
Inspection
Bin placement
Authorization to sell
Picking
Boxing
Shipping
The philosophy of the inventory accumulating snapshot fact

table is to provide an updated status of the product shipment
as it moves through above milestones.
Rarely used in long-running, continuously replenished
inventory processes.
More on this in chapter 5.
Value Chain Integration
Both business and IT organizations are interested

in value chain integration
Desire to look across the business to better
evaluate overall performance
Data marts may correspond to different business
processes
Need to look consistently at dimensions shared
between business processes
Need an integrated data warehouse architecture
If dimension table attributes in various marts are
identical, each mart is queried separately; the
results are then outer-joined based on a common
dimension attribute = drill across
17
Data Warehouse Bus Architecture
Cannot built the enterprise data warehouse in one

step.
Building isolated pieces will defeat consistency
goal.
Need an architected incremental approach
data warehouse bus architecture.
By defining a standard bus interface for the data
warehouse environment, separate data marts can
be implemented by different groups at different
times. The separate data marts can be plugged
together and usefully coexist if they adhere to the
standard.
Data Warehouse Bus Architecture

Contd
During architecture phase, team

designs a master suite of
standardized dimensions and facts
that have uniform interpretation
across the enterprise.
Separate data marts are then
developed adhering to this
architecture.
18
Data Warehouse Bus Matrix
The rows of the bus matrix correspond to

business processes data marts
Separate rows should be created if:
the sources are different,

the processes are different, or
a row represents more than what can be tackled in
a single implementation iteration.
Creating the DW bus matrix is a very important

up-front deliverable of a DW implementation. The
DW bus matrix is a hybrid resource: technical
design tool, project management tool, and
communication tool.
Conformed Dimensions
Conformed dimensions are:
Conformed dimensions have consistent
identical, or
strict mathematical subsets of the most granular,
detailed dimension.
Dimension keys
Attribute column names
Attribute definitions
Attribute values
If two marts have dimensions (e.g., customer,

product) that are not conformed, then they
cannot be used together
19
Types of Dimension Conformity
Mean same thing
Rolled-up level of granularity
Roll-up dimensions conform to the base-level

atomic dimension if they are a strict subset of that
atomic dimension. (see Fig. 3.9)
Dimension subset at same level of granularity
Single shared table or physical copy

Consistent data content, data interpretation, user
presentation
At same level but one represents only a subset of

rows
Combination of above
Centralized Dimension Authority
The major responsibility of the centralized

dimension authority is to:
establish,
maintain, and
publish the conformed dimensions to all client
data marts.
90% of up-front data architecture effort

Political challenge
20
Conformed Facts
In general, facts table data is not

duplicated explicitly in multiple data
marts.
If facts live in more than one location,
then their definitions and equations must
be the same and they must be called the
same.
If it is impossible to conform a fact
exactly, then different names should be
given to different interpretations. This
will make it less likely that incompatible
facts will not be used in a calculation.
Procurement
21
Overview
Value chain reinforcement

Blended versus separate transaction
schema
Slowly changing dimension techniques
Procurement Case Study
Procurement involves a wide range of activities:
Negotiation of contracts
Issuing of purchase requisitions & POs
Tracking receipts
Authorizing payments
Common analytic requirements:
Which materials/products are purchased most

frequently? Who supplies them? At what prices?
Across the enterprise, are there opportunities to
negotiate contracts by consolidating suppliers,
single sourcing?
Are we purchasing from the preferred vendors?
How are vendors performing?
22
Procurement Transactions
1.
Business process to model: Procurement
2.
3.
Grain: 1 row per procurement transaction

Key dimensions:
Transaction date
Product
Vendor (1 row for each vendor)
Contract terms (1 row for each set of terms negotiated with a vendor)
Procurement transaction type
Transaction date and product are conformed dimensions
4.
Transactions: purchase requisitions, purchase orders, shipping notifications,

receipts, and payments
Measured facts:
Procured units
Transaction amount
Multiple- vs. Single-Transaction Fact

Tables: Issues
Business users view purchase orders, shipping notices,

warehouse receipts, and vendor payments as separate &
unique processes
Several procurements transactions come from different
source systems
Several transaction types have different dimensionality
Purchasing system: purchase requisitions and purchase orders

Warehousing system: shipping notices and warehouse receipts
Accounts payable system: vendor payments
E.g., discounts applicable to vendor payments but not to other
types of transactions
Control numbers such as PO # generated during procurement

process are candidates for degenerate dimensions
Design decision:
Build separate fact tables for each transaction type, or

Build a blended transaction fact table with a transaction type
dimension
No simple answer.
23
Multiple- vs. Single-Transaction Fact

Tables - Cont.
Questions to ask:
What are the users analytic requirements?

How do users usually analyze data?
Do they analyze multiple transaction types together
or do they usually look at a single transaction type?
Are there multiple unique business processes?
Yes leaning towards separate tables
Are multiple source systems involved?
What is the dimensionality of the facts? Do some
dimensions apply only to some transaction types?
Solution: multiple transaction fact tables
Multiple Fact Tables
Multiple fact tables for procurement

processes
Advantages:
Richer, more descriptive dimensions and

attributes
Simplified staging activities, since operational
data exist in separate source systems
Loading data into separate fact tables will be
less complex than attempting to integrate from
multiple sources
Disadvantages:
More time to manage and administer: more

tables to load, index, and aggregate
24
Slowly Changing Dimensions
We have assumed dimensions to be

independent of time, but some
dimensions (other than natural keys) may
change slowly with time
Need to track change, without full-blown
normalized structure; without making
every dimension time-dependent
For each attribute in our dimension
tables, we must specify a strategy to
handle change
Techniques for Dealing with Dimension

Change Type 1: Overwrite the Value
Overwrite the old attribute value in the

dimension row, replacing it with the
current value. the attribute always
reflects the most recent assignment
The type 1 response is easy to
implement, but it does not maintain any
history of prior attribute values
Question: Is there a business need for
retaining the old attribute value?
25

Change Type 2: Add a Dimension Row
Create a new dimension row reflecting the new attribute

Two separate surrogate keys one for old row and one for
new
Could also use a most recent row indicator to tell us which

of the two rows is the current
Fact table is again untouched
Adding a dimension row is the primary technique for
accurately tracking SCD attributes
Advantage:
Product Key is used as the primary key instead of the SKU

number, which is the natural key and is the same for both rows
New dimension row automatically partitions history in fact table

pre-change fact rows use the pre-change surrogate key
No need to revisit preexisting aggregation tables
Disadvantage:
Accelerated dimension table growth

Does not allow us to associate the new attribute value with the
old fact file or vice versa

Change Type 3: Add a Dimension Column
Add a new dimension column containing the old

attribute value (E.g., Prior Department)
Overwrite the old value with the new
More appropriate when there is a need to associate
new attribute values with old fact history
E.g., business need to track both old and new values of

department attribute both forward and backward
Management can use either value for analysis

Allows for observing new and historical fact data by
either the new or prior attribute values
Used less frequently
Inappropriate to track numerous intermediate attribute
values
26
Hybrid Slowly Changing Dimension

Techniques
Two approaches that combine the

basic SCD techniques:
Predictable changes with multiple

version overlays
Unpredictable changes with singleversion overlay
These approaches provide more

flexibility at the cost of greater
complexity
Predictable Changes with Multiple

Version Overlays
Used in cases of sales organization realignments

Example: Over a 5-year period the sales organization
is reorganized five times.
At first sight, candidate for Type 2 approach (add
dimension row), but more complex business
requirements. E.g.,
Report each years sales using the district map for that
year
Report each years sales using the district map from an
arbitrary different year
Report an arbitrary span of years sales using a single
district map from a chosen year.
Type 3 is also inappropriate because >2 district maps

Because changes are predictable, an extension of Type
3 is possible Multiple District columns:
Current District; District 2001; District 2002;
27
Unpredictable Changes with SingleVersion Overlay
Preserve historical accuracy surrounding

unpredictable attribute changes while
supporting the ability to report historical
data according to the current values
Issue a new dimension row (type 2) to
capture the change and add a new
dimension column to track the historical
value (type 3). Also, overwrite Current
Department value (Type 1).
More Rapidly Changing Dimensions
Break off the rapidly changing

attributes into one or more
separate dimensions
Two foreign keys in fact table:
1.
2.
Primary dimension table

Rapidly changing attribute(s)
28

Session 4 Case Study Retail Case

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Session 4 Case Study Retail Case

Încărcat de

Drepturi de autor:

Formate disponibile

Retail Sales

Four-step dimensional design process

Four-Step Dimensional Design Process

Select the business process to model.

not business department or function

Declare the grain of the business

Specifies individual fact table row

Four-Step Dimensional Design Process

Choose the dimensions that apply for each fact

Identify the numeric (measured) facts that will

Q: How do business people describe the data that

In making decisions regarding the 4 steps,

Retail Case Study

Large grocery chain: 100 grocery stores over

Departments: grocery, frozen foods, dairy, meat,

from cash registers into a point-of-sale (POS)

Retail Case Study Contd

Logistics of ordering, stocking, and selling

Step 1. Select the Business Process

Decide what business process to model, by

with the most impact,

In retail case study: POS retail sales

Step 2. Declare the Grain

What level of data detail should be made

Case study grain: individual line item on

Step 3. Choose the Dimensions

A careful grain statement determines the

Preliminary Retail Sales Schema

POS Sales Transaction Fact

Promotion Key (PK)

Product Key (PK)

Date Key (FK)

Date Key (PK)

Store Key (PK)

Step 4. Identify the Facts

Picking the business measurements for the fact

Sales quantity, sales price/unit, sales $ amount,

Recommendation: Include in fact table even

For non-additive measurements such as

Ubiquitous in every data mart

E.g., Holiday Indicator use values: Holiday,

Date Key should be an integer rather than a date

Describes every SKU in the store

SKUs roll up to brands to categories to

Although there will be redundancy, no need to

The store dimension: Store Key

Store to any geographic attribute (e.g.,

Describes the promotion conditions under which a

or split into 4 separate dimensions

They are highly correlated, so not much difference in

To avoid null keys in the fact table (violation of

Factless Fact Table

Q: Which products were under promotion but did

Factless Fact Table = has no measurement metrics

Two-step process to answer Q:

Query Promotion Coverage table: products under

Degenerate Dimension (DD)

Dimension keys used in fact table without

Retail Schema Extensibility

Original schema extends gracefully

Dimensional models can handle extensions without

Resisting Dimension Normalization

Snowflaking = Dimension table normalization