Documente Academic
Documente Profesional
Documente Cultură
&
Data Mining
1
Overview
2
Part 1: Data Warehouses
3
Data, Data everywhere
yet ...
❚ I can’t find the data I need
❙ data is scattered over the
network
❚ manyget
I ❙can’t versions, subtle
the data I need
❙ differences
need an expert to get the data
❚ I can’t understand the data I
found
❙ available data poorly
documented
❚ I can’t use the data I found
❙ results are unexpected
❙ data needs to be transformed
from one form to other 4
What is a Data Warehouse?
[Barry Devlin]
5
Why Data Warehousing?
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?
❚ A subject-oriented, integrated,
time-variant, and non-volatile
collection of data in support of
management’s decision-making
process (Inmon, 1993).
9
Sub ject- ori ente d Data
❚ The warehouse is organized around
the major subjects of the enterprise
(e.g. customers, products, and sales)
rather than the major application
areas (e.g. customer invoicing, stock
control, and product sales).
10
Integrated Dat a
11
Time -v ariant Data
13
Benef its of D ata
Warehous ing
❚ Competitive advantage
❚ Increased productivity of
corporate decision-makers
14
Com pa rison of OLT P
Systems and Data
Warehous ing
15
Data Wa rehouse
Queri es
❚ The types of queries that a data
warehouse is expected to answer
ranges from the relatively simple to
the highly complex and is dependent
on the type of end-user access tools
used.
❚ Data homogenization
18
Probl ems of Da ta
Warehous ing
❚ High demand for resources
❚ Data ownership
❚ High maintenance
❚ Complexity of integration
19
Typic al Archi tecture of
a Data War eh ouse
20
Data Mart
❚ A subset of a data warehouse that
supports the requirements of a
particular department or business
function.
❚ Characteristics include
❙ Focuses on only the requirements of one
department or business function.
❙ Do not normally contain detailed
operational data unlike data warehouses.
❙ More easily understood and navigated.
21
Reas on s for Cr eati ng a
Data Mart
❚ To give users access to the data they need
to analyze most often.
22
Reas on s for Cr eati ng a
Data Mart
❚ To provide appropriately structured
data as dictated by the requirements
of the end-user access tools.
24
From the Data Warehouse
to Data Marts
Information
Individually Less
Structured
Departmentally History
Structured Normalized
Detailed
Organizationally More
Structured Data Warehouse
Data
25
Part 2: OLAP
26
Nature of OLAP Analysis
33
OLAP Appl icati ons
34
OLAP Appl icati ons -
support for compl ex
cal cula tion s
❚ Must provide a range of
powerful computational methods
such as that required by sales
forecasting, which uses trend
algorithms such as moving
averages and percentage
growth.
35
OLAP Appl icati ons –
ti me intel ligence
❚ Key feature of almost any analytical
application as performance is almost
always judged over time.
39
Rep resentati on of
Mu lti- di mensi onal Data
❚ Example of three-dimensional
query.
❙ ‘What is the total revenue generated by
property sales for each type of property
(Flat or House) in each city, in each
quarter of 2004?’
41
Rep resentati on of
Mu lti- di mensi onal Data
❚ Cube represents data as cells in
an array.
42
Multi-dimensional Data
❚ Measure - sales (actual, plan,
variance) Dimensions: Product, Region, Time
Hierarchical summarization paths
n
io
W
eg
S
R
46
Data Min ing
❚ The process of extracting valid,
previously unknown, comprehensible,
and actionable information from large
databases and using it to make
crucial business decisions,
(Simoudis,1996).
48
Data Min ing
❚ Most accurate results normally
require large volumes of data to
deliver reliable conclusions.
49
Data Min ing
❚ Data mining can provide huge
paybacks for companies who
have made a significant
investment in data warehousing.
50
Exa mple s of
Appl icati ons of D ata
Mini ng
❚ Retail / Marketing
❙ Identifying buying patterns of
customers
❙ Finding associations among
customer demographic
characteristics
❙ Predicting response to mailing
campaigns
❙ Market basket analysis
51
Exa mple s of
Appl icati ons of D ata
Mini ng
❚ Banking
❙ Detecting patterns of fraudulent
credit card use
❙ Identifying loyal customers
❙ Predicting customers likely to
change their credit card affiliation
❙ Determining credit card spending
by customer groups
52
Exa mple s of
Appl icati ons of D ata
Mini ng
❚ Insurance
❙ Claims analysis
❙ Predicting which customers will buy new
policies
❚ Medicine
❙ Characterizing patient behavior to
predict surgery visits
❙ Identifying successful medical therapies
for different illnesses
53
Data Min ing Operat ions
❚ Four main operations include:
❙ Predictive modeling
❙ Database segmentation
❙ Link analysis
❙ Deviation detection
54
Data Min ing
Tec hniques
❚ Techniques are specific
implementations of the data
mining operations.
55
Data Mi ning Operat ions
and Ass ocia ted
Techniq ue s
56
Predi ctive Model ing
❚ Similar to the human learning
experience
❙ uses observations to form a model of the
important characteristics of some
phenomenon.
58
Predi ctive Model ing
❚ Applications of predictive modeling
include customer retention
management, credit approval, cross
selling, and direct marketing.
59
Exampl e of Cl assi ficati on
usi ng Tree I nducti on
60
Predi ctive Model ing -
Val ue Predi ctio n
❚ Used to estimate a continuous
numeric value that is associated with
a database record.
❚ Applications of database
segmentation include
customer profiling, direct
marketing, and cross selling.
65
Exampl e of Datab ase
Segmentati on usi ng a
Scatterpl ot
66
Li nk Anal ysi s
❚ Aims to establish links
(associations) between records, or
sets of records, in a database.
69
Li nk Anal ysi s - Simi lar
Tim e Sequence Di scover y
❚ Finds links between two sets of
data that are time-dependent,
and is based on the degree of
similarity between the patterns
that both time series
demonstrate.
❙ e.g. Within three months of buying
property, new home owners will
purchase goods such as cookers,
freezers, and washing machines.
70
Devi ati on Detecti on
❚ Relatively new operation in terms of
commercially available data mining
tools.
71
Devi ati on Detecti on
73