Documente Academic
Documente Profesional
Documente Cultură
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Representation
Is the mapping of information to a visual format
Data objects, their attributes, and the relationships
among data objects are translated into graphical
elements such as points, lines, shapes, and
colors.
Example:
8/05/2005
Representation
Tan,Steinbach, Kumar
8/05/2005
Arrangement
Is the placement of visual elements within a
display
Can make a large difference in how easy it is to
understand the data
Example:
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Histogram
Usually shows the distribution of values of a single variable
Divide the values into bins and show a bar plot of the number of
objects in each bin.
The height of each bar indicates the number of objects
Shape of histogram depends on the number of bins
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Two-Dimensional Histograms
Show the joint distribution of the values of two
attributes
Example: petal width and petal length
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Five-Number Summary
Five-Number Summary: min, Q1, Q2, Q3, Max
Interquartile range (IQR):
IQR = Q3-Q1
Tan,Steinbach, Kumar
8/05/2005
Five-Number Summary
Find the 5 Number Summary of the following numbers:
3
12
40
14
18
15
17
15
17
18
40
12
14
12
14
15
17
18
40
12
14
15
17
18
40
Tan,Steinbach, Kumar
12
14
15
17
18
8/05/2005
40
#
Five-Number Summary
3 - Smallest number in the set
9 - Median between the smallest number
and the median
14 - Median of the entire set
17 - Median between the largest number
and the median
40 - Largest number in the set
Tan,Steinbach, Kumar
12
14
15
17
18
8/05/2005
40
Five-Number Summary
4
16
16
38
18
50
24
24
24
29
27
41
29
36
33
18
36
37
33
38
37
41
24
42
27
45
45
50
Tan,Steinbach, Kumar
Smallest
42
24
Median
Median
Median
Largest
33
39.5
50
8/05/2005
Five-Number Summary
4
19
11
21
13
10
11
13
10
14
20
15
14
18
15
19
18
20
3 Kumar
Tan,Steinbach,
Smallest = 2
Median = 5.5
Median = 10.5
Median = 16.5
Largest = 21
8/05/2005
Five-Number Summary
A 5 Number Summary divides your data into four quarters.
12
14
15
17
18
1st
2nd
3rd
4th
Quarter
Quarter
Quarter
Quarter
Tan,Steinbach, Kumar
8/05/2005
40
12
14
15
17
18
40
Tan,Steinbach, Kumar
8/05/2005
12
14
15
17
18
40
Tan,Steinbach, Kumar
8/05/2005
12
14
15
17
18
40
Tan,Steinbach, Kumar
8/05/2005
- 12
+ 12
12
14
15
17
18
40
IQR = 8
-3
Tan,Steinbach, Kumar
39
OUTLIER
Introduction to Data Mining
8/05/2005
Box Plots
Invented by J. Tukey
Another way of displaying the distribution of data
Following figure shows the basic part of a box plot
outlier
90th percentile
75th percentile
50th percentile
25th percentile
10th percentile
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005
Scatter plots
Attributes values determine the position
Two-dimensional scatter plots most common, but can
have three-dimensional scatter plots
Often additional attributes can be displayed by using
the size, shape, and color of the markers that
represent the objects
It is useful to have arrays of scatter plots can
compactly summarize the relationships of several pairs
of attributes
Tan,Steinbach, Kumar
8/05/2005
Tan,Steinbach, Kumar
8/05/2005