Documente Academic
Documente Profesional
Documente Cultură
Cluster Analysis
Jacob Eskildsen
Cluster Analysis
Overview Idea & purpose Similarity measures Dissimilarity measures Hierarchical clustering Non-hierarchical clustering
QEM
Cluster Analysis
Jacob Eskildsen
Variables
Case/respondent
Object
Cluster analysis
Metric Nonmetric
Nonmetric
Multidimensional scaling
Correspondence analysis
Canonical correlation
MANOVA
QEM
Cluster Analysis
Jacob Eskildsen
QEM
Cluster Analysis
Jacob Eskildsen
( x y ) ( x y )
Statistical distance
d (x, y) =
( x y ) A ( x y )
ordinarily A = S-1
Minkowski metric
p d ( x , y ) = xi y i i =1
m
1m
QEM
Cluster Analysis
Jacob Eskildsen
Canberra metric
d (x, y) =
(x
i =1
xi y i
i
yi )
Czekanowski coefficient
d (x, y) = 1 2 min ( xi , y i )
i =1 p p
(x
i =1
yi )
5
QEM
Cluster Analysis
Jacob Eskildsen
QEM
Cluster Analysis
Jacob Eskildsen
QEM
Cluster Analysis
Jacob Eskildsen
QEM
Cluster Analysis
Jacob Eskildsen
d
i k
ik
N(UV )NW
The quantity dik is the distance between object i in cluster (UV ) and object k in cluster W , and N(UV ) and NW are the number of items in the two clusters. Average linkage is default in SPSS
QEM
Cluster Analysis
Jacob Eskildsen
10
QEM
Cluster Analysis
Jacob Eskildsen
11
QEM
Cluster Analysis
Jacob Eskildsen
QEM
Cluster Analysis
Jacob Eskildsen
13
QEM
Cluster Analysis
Jacob Eskildsen
Final Cluster Centers Cluster 2 8.4 7.6 7.7 7.9 8.2 8.8 7.3 5.7 6.3 4.2 5.8 6.6
The F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the differences among cases in different clusters. The observed significance levels are not corrected for this and thus cannot be interpreted as tests of the hypothesis that the cluster means are equal.
1 Seminar in Descriptive Statistics Economics Business Computing Business Statistics Mathematics Managerial Economics
Valid Missing
14