Sunteți pe pagina 1din 40


Gamper, Free University of Bolzano, DWDM 2012-13

Data Warehousing and Data Mining
Case Studies
Acknowledgements: I am indebted to Michael Bhlen for providing me his
slides, upon which these lecture notes are based.
Order Management
J. Gamper, Free University of Bolzano, DWDM 2012-13 2
Inventory Management/1
Consider a large grocery chain ith a central
arehouse and several retail stores
Advanced retail !usiness re"uires inventory

Ma%ing sure the right &roduct is in the right store at the

right time minimi'es out(o#(stoc%s and reduces overall
inventory carrying costs$

)he retailer needs the a!ility to analy'e daily "uantity(on(

hand inventory levels !y &roduct and store$
Design dimensional models that su&&ort the
analysis o# inventories #or retail !usinesses
*grocery stores+$
J. Gamper, Free University of Bolzano, DWDM 2012-13 3
Inventory Management/,
)he value chain identi#ies the natural- logical #lo
o# an organi'ation.s &rimary activities$
O&erational systems &rovide sna&shots at each ste&
ith interesting data and &er#ormance metrics$
Retailer Issues
Purchase Order
eliveries at
Retailer !arehouse
Retailer !arehouse
eliveries at Retail
Retail #tore
Retail #tore #ales
J. Gamper, Free University of Bolzano, DWDM 2012-13 4
Inventory Management//
/ di##erent inventory models

Inventory &eriodic sna&shot

Inventory transactions

Inventory accumulating sna&shot

J. Gamper, Free University of Bolzano, DWDM 2012-13 5
Inventory 0eriodic Sna&shot Model/1
Model 1: Inventory Periodic Snapshot

Every day *or at some other regular time interval+ the

inventory levels o# each &roduct is measured and stored
as a ne ro in the #act ta!le
1usiness &rocess2 Analysis o# retail store inventory
3ranularity2 Daily inventory !y &roduct at each
individual store
Dimensions2 Date- &roduct- and store
4acts/measures2 5uantity on hand
Date Dimension
ate $e" %P&'
ate (ttributes ...
Store Inventory Snapshot Fact
ate &e" %)&'
Product &e" %)&'
#tore &e" %)&'
*uantit" on +and
Store Dimension
#tore &e" %P&'
#tore (ttributes ...
Product Dimension
Product &e" %P&'
Product (ttributes ...
J. Gamper, Free University of Bolzano, DWDM 2012-13 6
Inventory 0eriodic Sna&shot Model/,
Inventory generates dense snapshot ta!les

In contrast- 0OS 6etail Sales ta!le as s&arse

Conse"uently- inventory #act ta!le is groing #ast

78$888 &roducts 9 188 stores : 7 Mio$ ros each time

With a ro idth o# 1; !ytes- this is <; M1 each time

1 year o# daily sna&shots ould !e /8 31

6educe sna&shot #re"uencies over time

=ast 78 days o# inventory at daily level

Wee%ly sna&shots #or older data

Instead o# 1$8>? sna&shots in / years- only ,8< sna&shots

ould !e re"uired
J. Gamper, Free University of Bolzano, DWDM 2012-13 7
Inventory 0eriodic Sna&shot Model//
)he "uantity on hand is a semi-additive measure

Can !e summari'ed across &roducts and stores- !ut not

across time

Di##erent in 0OS 6etail Sales ta!le2 Sold entities are

counted only once
All measures that record a static level *inventory-
#inancial account !alance- measures o# intensity
e$g$- tem&erature+ are inherently non-additive
across time and &ossi!ly other dimensions$

Can !e aggregated along time dimension !y averaging

A note a!out S5= A@3 #unction

Cannot !e used to com&ute the average over time- since

it averages over the num!er o# ros

Avg inventory over a cluster o# / &roducts in ; stores

across A days ould divide the summed value !y <;
J. Gamper, Free University of Bolzano, DWDM 2012-13 8
Inventory )ransactions Model/1
Model 2: Inventory Transactions

6ecord every transaction that a##ects inventory

Inventory transactions in the store chain

6eceive &roduct

0lace &roduct in to ins&ection hold

6elease &roduct #rom ins&ection hold

6eturn &roduct to vendor due to ins&ection #ailure

0lace &roduct in !in

Authori'e &roduct #or sale

0ic% &roduct #or shi&ment

Shi& &roduct to customer

6eceive &roduct #orm customer

6eturn &roduct to inventory #rom customer return

6emove &roduct #rom inventory

J. Gamper, Free University of Bolzano, DWDM 2012-13 9
Inventory )ransactions Model/1
Contains most detailed in#ormation- e$g$-

Bo many shi&ments #rom a given vendorC

On hich &roducts more than one round o# ins&ectionC

6econstruction o# e9act inventory num!ers is
&ossi!le- !ut not &racticalD

Esed in com!ination ith other #act ta!le

Date Dimension Warehouse Inventory Transaction Fact
ate &e" %)&'
Product &e" %)&'
!arehouse &e" %)&'
,endor &e" %)&'
Inventor" -ransaction -"pe &e" %)&'
Inventor" -ransaction *uantit"
Warehouse Dimension
!arehouse &e" %P&'
!arehouse .ame
!arehouse (ddress
!arehouse /it"
!arehouse 0ip
1 and more
Vendor Dimension
Product Dimension
Inventory Transaction Type Dimension
Inventor" -ransaction -"pe &e" %P&'
Inventor" -ransaction -"pe escription
Inventor" -ransaction -"pe 2roup
J. Gamper, Free University of Bolzano, DWDM 2012-13 10
Inventory Accum$ Sna&shot Model/1
Model 3: Inventory Accumulating Snapshot

One ro in the #act ta!le #or each shi&ment o# a &articular

&roduct to the arehouse

)his ro trac%s the dis&osition o# the shi&ment through

the arehouse
i$e$- &rovides an u&dated status o# the shi&ment as it moves
through the arehouse
Assume that the inventory goes through a series o#
events- e$g$- receiving- ins&ection- !in &lacement-
authori'ation to sell- &ic%ing- !o9ing- and shi&&ing$
J. Gamper, Free University of Bolzano, DWDM 2012-13 11
Inventory Accum$ Sna&shot Model/,
Date eceived Dimension
Warehouse Inventory Accumulating Fact
ate Received &e" %)&'
ate Inspected &e" %)&'
ate Placed in Inventor" &e" %)&'
ate (uthori3ed to #ell $e" %)&'
ate Pic$ed &e" %)&'
ate Bo4ed &e" %)&'
ate #hipped &e" %)&'
ate of 5ast Return &e" %)&'
Product &e" %)&'
!arehouse &e" %)&'
,endor &e" %)&'
*uantit" Received
*uantit" Inspected
*uantit" Returned to ,endor
*uantit" Placed in Bin
*uantit" (uthori3ed to #ell
*uantit" Pic$ed
*uantit" Bo4ed
*uantit" #hipped
*uantit" Returned b" /ustomer
*uantit" Returned to Inventor"
*uantit" amaged
*uantit" 5ost
6nit /ost
6nit 5ist Price
6nit (verage Price
6nit Recover" Price
Product Dimension
Date Picked Dimension
Date Authori!ed to Sell Dim
Date Placed in Inventory Dim
Date Inspected Dimension
Date "o#ed Dimension
Date Shipped Dimension
Date o$ %ast eturn Dim
Vemdpr Dimension
Warehouse Dimension
J. Gamper, Free University of Bolzano, DWDM 2012-13 12
Shared Dimensions
Value hain Integration

0rovide analysis across the !usiness to !etter evaluate the

&er#ormance *not Gust at the individual de&artment level+

End(to(end &ers&ective high(level management to customer

6e"uires integration and consistent handling/use o# data

Individual #act ta!les #or &rocesses H shared dimensions

Shared dimensions are crucial to design data
marts that can !e integrated

Allo to com!ine &er#ormance measurement #or di##erent

&rocesses *also %non as drill across+
Store Dimension
Promotion Dimension
Date Dimension
Product Dimension
Warehouse Dimension
Vendor Dimension
P&S etail Sales Transaction Fact
etail Inventory Snapshot Fact
Warehouse Inventory Transaction Fact
J. Gamper, Free University of Bolzano, DWDM 2012-13 13
Data Warehouse 1us Architecture
!ata "arehouse #us Architecture

A standard !us inter#ace #or a data arehouse

environment that allos to im&lement se&arate data
marts that can !e success#ully integrated

1ased on con#ormed *similar+ dimensions that are shared

!y the data marts
3uides the overall design and !rea%s don the
develo&ment &rocess into small chun%s *DMs+
J. Gamper, Free University of Bolzano, DWDM 2012-13 14
Data Warehouse 1us Matri9/1
!ata "arehouse #us Matri$

)ool to create and document the !us architecture

6os re&resent !usiness &rocesses *translate into DMs+

Columns re&resent a suite o# standardi'ed dimensions

J. Gamper, Free University of Bolzano, DWDM 2012-13 15
Data Warehouse 1us Matri9/,
Creating the DW !us matri9 is one o# the most
im&ortant u&(#ront delivera!les o# a DW

Create a com&rehensive list o# dimensions !e#ore #illing in

the matri9
)he ros &rovide a concise overvie a!out the
dimensionality o# the individual DMs
)he columns sho the interaction !eteen the
DMs and the common/shared dimensions
J. Gamper, Free University of Bolzano, DWDM 2012-13 16
Con#ormed Dimensions
on%ormed !imension

Either identical or strict mathematical su!sets o# the most

granular- detailed dimension

Bave consistent dimension %eys- and consistent column

names and values
6oll(u& dimensions con#orm to the !ase(level
atomic dimension

1rand ta!le is a strict su!set &roduct ta!le

Product Dimension
Product &e" %P&'
Product escription
#&6 .umber
Brand escription
#ubcategor" escription
/ategor" escription
epartment escription
Pac$age -"pe escription
Pac$age #i3e
1 and more
"rand Dimension
Brand &e" %P&'
Brand escription
#ubcategor" escription
/ategor" escription
epartment escription
J. Gamper, Free University of Bolzano, DWDM 2012-13 17
Order Management/1
&rder management consists o# several critical
!usiness &rocesses *order- shi&ment- invoice
&rocessing- etc$+ and measures *sales volume-
invoice revenue- etc$+
Warehouse !us matri9
J. Gamper, Free University of Bolzano, DWDM 2012-13 18
Order Management/,

Should there !e a IShi& Date JeyK in the #act ta!leC

Can/should Order Date Jey and 6e"uested Shi& Date Jey

!e #oreign %eys to the same dimension ta!leC
&rder Date Dimension
Order ate &e" %P&'
Order ate
Order ate a" of !ee$
&rder Transaction Fact
Order ate &e" %)&'
Re7uested #hip ate &e" %)&'
Product &e" %)&'
/ustomer #hip -o &e" %)&'
#ales Rep &e" %)&'
eal $e" %)&'
Order .umber %'
Order *uantit"
2ross Order ollar (mount
Order eal iscount ollar (mount
.et Order ollar (mount
e'uested Ship Date Dimension
Re7uested #hip ate &e" %P&'
Re7uested #hip ate
Product Dimension
Deal Dimension
Sales ep Dimension
(ustomer Ship To Dimension
J. Gamper, Free University of Bolzano, DWDM 2012-13 19
'ole-playing: When a single dimension a&&ears
several times in the same #act ta!le-

e$g$- order date and re"uested shi& date

Should not !e 4J to the same dimension ta!le

S5= ould re"uire the to dates to !e the same

We might ant to constrain the to dimensions di##erently

Might !e one &hysical ta!le- !ut each o# the roles
should !e de#ined as a vie ith di##erent la!els
&rder Date Dimension
Order ate &e" %P&'
Order ate
Order ate a" of !ee$
e'uested Ship Date Dimension
Re7uested #hip ate &e" %P&'
Re7uested #hip ate
Re7uested #hip ate a" of !ee$
Date Dimension
ate &e" %P&'
ate a" of !ee$
create view ... create view ...
J. Gamper, Free University of Bolzano, DWDM 2012-13 20
Multi&le Bierarchies
Multiple hierarchies o#ten coe9ist in a dimension
ta!le- es&ecially #or customer(oriented dimensions

Natural geogra&hic hierarchy

Li& Code de#ines a second hierarchy

Can have di##erent num!er o# levels
Should all !e su&&orted in a DW
(ustomer Ship To Dimension
/ustomer #hip to &e" %P&'
/ustomer #hip -o I %.&'
/ustomer #hip -o .ame
/ustomer #hip -o (ddress
/ustomer #hip -o /it"
/ustomer #hip -o #tate
/ustomer #hip -o 0ip 8 9
/ustomer #hip -o 0ip Region
/ustomer #hip -o 0ip #ectional /enter
/ustomer Bill -o .ame
/ustomer Bill -o (ddress (ttributes
/ustomer /redit Rating
/ustomer 6R5
J. Gamper, Free University of Bolzano, DWDM 2012-13 21
4act Normali'ation
(act normali)ation: 4urther normali'e the #act
ta!le and colla&se all measures into a single
measure along ith a dimension that identi#ies the
ty&e o# the measure$
Ma%es only sense i#

#act ta!le is s&arsely &o&ulated and no com&utations are

made !eteen measures o# di##erent ty&e

e$g$- medical tests here di##erent things are measured

and data is s&arse
Amount Dimension
(mount &e" %P&'
(mount -"pe %gross, discount, ...'
&rder Transaction Fact
(mount %)&'
Order *uantit"
J. Gamper, Free University of Bolzano, DWDM 2012-13 22
Colla!oration ith Bos&ital Meran

1Sc thesis o# A$ Beinisch

OncoNet is an a&&lication #or the management o#
&atients undergoing a cancer thera&y

Cancer thera&y #ollos a treatment &lan / &rotocol

)vent type )vent description Date
ata collection /t -hora4 a" :;, <;;, =:;, >?;
ata collection /t (bdomen Day 60, 100, 360, 720
ata collection +emogram a" <<;
Medication 0ofran a" <
Medication (driblastina a" <
J. Gamper, Free University of Bolzano, DWDM 2012-13 23
#usiness process: Analysis o# cancer thera&ies

Bo many &atients ith normal !lood &ressure a#ter M

Which dosages o# drug A ere success#ul to reduce M

*ranularity2 Individual events o# the

Includes measurements- e9aminations- "uestionnaires-

J. Gamper, Free University of Bolzano, DWDM 2012-13 24
0atient Dimension and Drug Dimension
Date Dimension
Patient &e" %#&'
Patient I
Patient )irst .ame
Patient 5ast .ame
Patient 2ender
Patient (ddress
Patient 0IP /ode
Patient Phone .umber
Patient Profession
Patient )irst 5anguage
Patient +eight
Patient !eight
Patient Bod" #urface (rea
Patient Place of Birth
Patient Birthda" ate $e"
Patient eath ate &e"
Patient )irst (dmission octor
Patient )irst (dmission (rea
Patient #mo$ing Indicator
Patient /igarettes per a"
Patient (lcohol Indicator
Patient (lcohol (mount per a"
Drug Dimension
rug &e" %#&'
rug I
rug .ame
rug /ategor"
rug (ctive #ubstance
rug manufacturer
rug *uantit" 6nit
rug *uantit" 6nit escription
rug (dministration -"pe
rug (dministration 5ocation
rug Pac$aging
rug Pac$aging (I/ /ode
J. Gamper, Free University of Bolzano, DWDM 2012-13 25
Normali'ed #act ta!le

One measure only

)y&e o# measure is descri!ed in Event and Investigation

(hemotherapy )vent Fact
ate &e" %)&'
Prescribing ate &e" %)&'
Relative ate &e" %)&'
Patient &e" %)&'
-herap" &e" %)&'
rug &e" %)&'
@vent &e" %)&'
Investigation &e" %)&'
#urve" 2roup &e" %)&'
.umverical ,alue
-e4tual ,alue
J. Gamper, Free University of Bolzano, DWDM 2012-13 26
Date *ey Date
Full Date Day o$
+onth ,ear
< .ormal =;.;A.;: -uesda" !ee$da" Ma" ?;;:
Therapy *ey Therapy ID Therapy
< ?BCA9 .+5 /hop <9 Profile
First -ame %ast -ame .ender %anguage Weight /eight
< A?? (ndreas +einisch Male 2erman CC <CA
Date *ey Patient *ey Therapy *ey )vent *ey Investigation
< < < < < =?,>
< < < < ? <<,9
< < < < = B>,C
)vent *ey )vent ID -ame Type esponsi
< B?:> 6rgent
.roup %evel %a0el 1nit
< +emogram M/+ <
< +emogram +B gDdl
= +emogram M/, pg
J. Gamper, Free University of Bolzano, DWDM 2012-13 27

M+dical !ata Warehousing and A,alysis

Colla!oration !eteen the Bos&ital Meran and the 4E1


Conduct research and create com&etences in the #ield o#

medical data arehousing and analysis

1uild a 1I/DW solution

Administrative DW

Medical DW

Develo& and a&&ly data analysis/mining techni"ues

J. Gamper, Free University of Bolzano, DWDM 2012-13 28
Data sources in a health care environment

Internal &roduction systems *S5=- E9cel- )e9t #iles- $$$+

E9ternal in#ormation systems

J. Gamper, Free University of Bolzano, DWDM 2012-13 29
1udgeting Esing a S&readsheet
+$ample: 1udgeting in the controlling de&artment

Com&le9 E9cel s&readsheets have !een used in the &ast

J. Gamper, Free University of Bolzano, DWDM 2012-13 30
1udgeting Esing 5lic%@ie/1
+$ample -contd./

A 5lic%@ie a&&lication to re&lace E9cel

Direct access to the data sources

Data integration in 5lic%@ie

No DW in &lace
J. Gamper, Free University of Bolzano, DWDM 2012-13 31
1udgeting Esing 5lic%@ie/,
+$ample -contd./: 5lic%@ie a&&lication
J. Gamper, Free University of Bolzano, DWDM 2012-13 32
1udgeting Esing 5lic%@ie//
+$ample -contd./: 5lic%@ie a&&lication
J. Gamper, Free University of Bolzano, DWDM 2012-13 33
1udgeting DW Solution/1

!i%%icult to convince decision ma%ers to !uild a !" as the

core o% a #I solution

5lic%@ie is mainly an analysis tool and cannot re&lace a DW

6ather ad hoc

Di##icult to control data "uality

Not scala!le #or many a&&lications- changing sources- etc$

J. Gamper, Free University of Bolzano, DWDM 2012-13 34
1udgeting DW Solution/,

Oracle ODI #or E)= &art and ODS

Oracle D1 #or the DW

5lic%@ie and O1I #or data analysis

B interni
)ile @4cel
O# !

Data sources &DS DW Data analisi
J. Gamper, Free University of Bolzano, DWDM 2012-13 35
1udgeting DW Solution//
onceptual model o# the
data cu!e Bos&italNStay

Each event stores a

hos&ital stay o# a &atient
Similar model #or
data cu!e Services and
data cu!e )rans#er
J. Gamper, Free University of Bolzano, DWDM 2012-13 36
1udgeting DW Solution/;
0ogical model o# the data cu!e Bos&italNStay
J. Gamper, Free University of Bolzano, DWDM 2012-13 37
MEDAN 1ottom E& A&&roach
1ottom(u& a&&roach

0rototy&e o# DM CON) ith / cu!es *hos&ital stay- services-

trans#er+ #or hos&ital Meran

De&loy DM CON) in hos&ital Meran- then in other hos&itals

6e&eat the same cycle #or other Dms *DM 0ersonnel- DM

0harmacy- DM =a!oratory- etc$+
J. Gamper, Free University of Bolzano, DWDM 2012-13 38
MEDAN Multi(valued Dimensions/1
Multi(valued dimensions

Diagnoses and &rocedures are e9am&les o# multi(valued


i$e$- a &atient ty&ically has multi&le diagnoses *u& to O 18+


6eserve multi&le columns-

one #or each diagnosis

Many em&ty cells-

i$e$- s&arse #act ta!le

Ese several #acts #or a

single recovery

Increases the num!er o#

tu&les in #act ta!le

1ridge ta!les
+ealthErecordEnumbers %)&'
+ealthErecordE"ear %)&'
PatientE$e" %)&'
Diagnosis2*ey23 4F*5
Diagnosis2*ey26 4F*5
Diagnosis2*ey27 4F*5
+ealthErecordEnumbers %)&'
+ealthErecordE"ear %)&'
PatientE$e" %)&'
Diagnosis2*ey 4F*5
J. Gamper, Free University of Bolzano, DWDM 2012-13 39
MEDAN Multi(valued Dimensions/,
#ridge ta1le #or multi(valued dimensions
J. Gamper, Free University of Bolzano, DWDM 2012-13 40
MEDAN =essons =earned
=essons learned

Develo&ing a 1I &lat#orm is a process that ta%es years

A ell(designed and consistent DW is the #oundation #or 1I

5li%@ie is a tool #or "uic% analysesP it cannot re&lace a DW

Do not &ut anything in a single data mart

Ese one DM #or one !usiness "uery *set o# closely related

!usiness "ueries+

Di##erent o&inions on !ottom(u& vs$ to&(don- !ut !ottom(

u& seems to have more acce&tance

Data modeling is di##icult !ut very im&ortant

Bel&s to get a con#ormed vie on the !usiness

e$g$- What is an admissionC

Di##erent granularity !y di##erent users- e$g$- 0rovince- Bos&ital

Esta!lishing a #usiness Intelligence ompetence

enter *1ICC+ is crucial

Coordinates the 1I &roGect

Com!ines !usiness- I) and analytical s%ills

S-ar putea să vă placă și