Documente Academic
Documente Profesional
Documente Cultură
Variables
Education: percent completing college Income per Capita: in 2011 dollars Urban Percentage: percent living in urban areas Population Density: people per square mile Non-English Language: percent that dont speak English Median Age: in years Unemployment Rate: for 2012 Percent Female: of general population Crime Rate: violent crimes per 100,000 Manufacturing: % of GDP Infant Mortality: 2006 Poverty Rate: percentage below poverty level
Urban Percentage
Percent Female
Poverty Rate
Star Glyphs
Data Input
10
11
Cluster Analysis
Method for assigning objects to groups so that objects in the same group are more similar to each other than they are to objects in other groups.
Some questions: 1. How many natural groups are there? 2. Which objects belong to each group?
12
Data Input
13
Analysis Options
14
15
16
17
Dendrogram - 4 Clusters
18
19
20
Method of k-Means
1. Starts with a set of typical objects (seeds) for each cluster. 2. Each object is assigned to group with nearest seed. 3. Centroids of groups are computed. 4. Objects are assigned to different groups if nearer to a different centroid. 5. Steps 3 and 4 are repeated until no change.
21
Method of k-Means
22
Seeds
Row 5 = California Row 32 = New York
23
24
Classification
The classification problem deals with the assignment of objects to known groups. Goals include:
1. Understanding what differentiates the groups. 2. Predicting group membership for unclassified objects.
25
Methods
Discriminant Analysis (on the Centurion Relate menu) based on Fishers linear discriminant functions.
Neural Network Classifier (on the Centurion Relate menu) based on a Bayesian classifier with priors and costs. Uniwin (companion program to Statgraphics) includes methods for dealing with qualitative factors.
26
D j d j1 Z1 d j 2 Z 2 ... d jp Z p
These functions are derived so as to maximize the separation of the groups. If there are g groups, there are g-1 discriminant functions.
27
Example
28
Data Input
29
Analysis Options
30
All Variables
31
32
Backward selection
33
Classification Functions
These functions create a score for each group:
C j c j1 X 1 c j 2 X 2 ... c jp X p c j 0
Objects are assigned to whatever group has the largest value of (Cj * priorj) where priorj is the prior probability of belonging to group j.
34
Classification Functions
35
Classification Summary
36
Classification Table
37
Validation Set
May leave some of the objects out of the training set and use them to validate the results.
38
39
(X 0.001) 4
density
3 2 1 0 35 55 12 75 95 115 8 16 20
24
(X 0.001) 5 4 3 2 1 0
density
35
Urban Percentage
Urban Percentage
40
Schematic Diagram
41
Data Input
42
Analysis Options
43
Classification Summary
44
45
Some Improvement
46
Classification Plot
47
Classification Plot
48
49
50
New Variables
51
52
Dendrogram
53
Uniwin Clusters
54
Uniwin Classification
55
Classification Results
Wrongly classified only Nevada (only cattle state to vote for Obama).
56
More Information
Statgraphics Centurion: www.statgraphics.com
Uniwin Plus: www.statgraphics.fr or www.sigmaplus.fr Or send e-mail to info@statgraphics.com
Follow us on