Sunteți pe pagina 1din 25

Basic Plots

Univariate, bivariate, multivariate


Histogram, boxplot, dotplot, barchart, spine plot
Scatterplot, density plots, mosaic plot
Parallel coordinate plot, profile plots
Maps
Time series plots
Data and its shapes
Data comes in a lot of different formats
We will assume that we can always get it
into a shape (spread-sheet like) with
headers at the top
columns for each piece of information
and rows for each object
Univariate
A dotplot is used for real-valued
variables. %Cap
A dot is positioned along an axis to 14
represent the data value. 12.8
13
1.3
13
0 5 10 15
10
% capitals
Breaks at $1

Univariate

80
60
Frequency

40
Histograms are

20
used for real-

0
valued variables. 0 1 2 3 4 5 6 7 8 9 10

Tips ($)

Values are binned Breaks at 10c

and the count is


displayed by a 40
30

rectangle
Frequency

20
10
0

0 1 2 3 4 5 6 7 8 9 10

Tips ($)
Univariate
Boxplots are used for
real-valued variables

10
Max 10
The data values are

8
Q3 3.6
summarized by 5
Median 2.9

6
numbers: min, Q1,

Tips
median, Q3, max Q1 2

4
Min 1
The boxplot displays

2
just these 5 numbers
Univariate F M
87 157
Barcharts are
used for Gender of person paying the bill

categorical data

150
The count for each

100
category is
represented by the
height of a 50
0

rectangle F M
Univariate
F M
Spine plots are used for
categorical variables 87 157
The count for each Gender

category is represented F M

by the width of a
rectangle
Most useful when there
are t wo variables or
more.
Bivariate
Scatterplots place a dot representing
a pair of numbers on a Cartesian plane

10

Bill Tip
r=0.68

8
16.99 1.01

● ●
● ●

Total Tip

6

10.34 1.66

● ●
● ●
●● ● ● ● ●● ● ●
● ●● ●
● ● ●

● ●● ● ●
●● ●● ● ● ●● ●

21.01 3.5
● ● ● ● ●

4

●● ●
●● ●
● ●
● ● ● ●●●
● ● ● ●● ● ● ● ●

●●●● ●●
● ●
●●●●●

● ●●
●●
●●●
●●●●●● ●●●● ● ●● ●
● ●
●●● ●● ●●● ● ● ●
●●● ● ●●● ● ●
●●● ● ●●
● ● ● ●

23.68 3.31
●●●●●
●●●●●●
●●
●●●●
●●
●● ● ● ●● ● ●● ●
● ●
2

● ● ●
●●
●●●●● ● ●
●●
●●●●●●●
● ●● ● ● ● ● ●
●●●● ● ●
● ●● ● ●

... ...
0

0 10 20 30 40 50

Total Bill
Bivariate
Density plots : hexagonal grids (Carr)

10

Counts
8
9
7
6 5
Tip

4
4 3
2
2
2 1
1

10 20 30 40 50
Total Bill
Bivariate Thu Fri Sat Sun
F 32 9 28 18
A mosaic plot M 30 10 59 58
represents a t wo way
table of categorical Thu Fri
Day
Sat Sun
variables.

F
It starts from a spine

Gender
plot and divides the
bars according to
M
counts of a second
variable.
Barcharts with t wo variables
Stacked barchart Side-by-side barchart
Male
80

Female

50
Male Female
Male
60

40
30
40

20
20

10
0

Thu Fri Sat Sun Thu Fri Sat Sun


Day Day
Multivariate
A parallel coordinate plot changes
from orthogonal Cartesian axes to
parallel axes.

Bill Tip

Standardized Values
Standardized Values

16.99 1.01
10.34 1.66
21.01 3.5
23.68 3.31 Bill Tip Bill Tip

... ... Variables Variables


Parallel
coordinate
plots
Standardized Values
Look for
patterns in
the direction
of the lines

Bill Tip Gender


Variables
Parallel coordinate plots
Measurements on Beetles

What
patterns
do you see
Standardized Values

here?

tars1 aede3 aede2 aede1 head tars2


Variables
Maps
Convention North at top
The problems with taking longitude at
number value
Aspect ratio of lat to long
Small regions/areas and reading
information
Longitude at numerical value and
aspect ratio of longitude to
latitude
Can you imagine
what the world
would look like if
50

the vertical and


horizontal plot
0

space were equal?


-50

-150 -100 -50 0 50 100 150

This location is both


-180 and 180
Small areas/regions
2004 election
results on map: Cartogram of 2004
red=republican, election results
blue=democrat

http://www-personal.umich.edu/~mejn/election/
Time series plots What’s
Temporal scale: wrong with
this plot?
Days of weeks

80
need to be in

60
Count
conventional

40
order, lines, ...

20
0
Fri Sat Sun Thu
Lines indicate Day

temporal
dependency
Average number of sunspots

150
100
50
0

1750 1760 1770 1780 1790 1800 1810 1820 1830 1840 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990

Year
Combinations Beetles
110 120 130 140 120 130 140 150 60 80 100 120

240
tars1

200
Small multiples is an

160
approach advocated by

120
140
tars2

Tufte to plotting

130
120
110
multiple variables in a head

55
digestible way. This

50
might be considered as

45
aede1
120 130 140 150

combinations of basic
plots.

16
aede2

14
12
~

10
8
120

aede3
100
80
60

120 160 200 240 45 50 55 8 10 12 14 16


Waseca
Trebi

Dotplots
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
Crookston
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota

Barley yield for Trebi


Wisconsin No. 38
Morris

t wo different
No. 457
Glabron
Peatland
Velvet
No. 475

years for 6
Manchuria
No. 462
Svansota 1932
University Farm 1931

locations in
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland

Minnesota and 10 Velvet


No. 475
Manchuria
No. 462

varieties
Svansota
Duluth
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
Grand Rapids
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota

20 30 40 50 60

Barley Yield (bushels/acre)


Boxplots or Different representations of the
heights of the New York Choral
Dotplots Society
Soprano 1

Soprano 1
Soprano 2

Soprano 2
Alto 1

Alto 1
Alto 2

Alto 2
Tenor 1

Tenor 1

Tenor 2
Tenor 2

Bass 1
Bass 1

Bass 2
Bass 2

60 65 70 75

Height (inches)
60 65 70 75

~
Basic plots form the core

How does Napoleon’s March use basic


plots? time series plot + map + barchart
How is John Snow’s cholera map related to
a basic plot? Map + scatterplot
Napoleon’s March

Basic
plot(s)?
Cholera
map
AIDS in
…And on a Global Scale
SWEDEN FINLAND
CANADA NORWAY
POLAND
DENMARK
ESTONIA, LATVIA, LITHUANIA NEPAL
NETHERLANDS
CANADA
In many countries, reliable statistics on AIDS are BELARUS
KAZAKHSTAN
BRITAIN
hard to obtain. Unaids, the United Nations AIDS AFGHANISTAN
MONGOLIA SOUTH
IRELAND KOREA
agency, relies on these numbers. Each square on BELGIUM CHINA
GER. RUSSIA
UNITED STATES the grid represents 2,500 people with AIDS. AUSTRIA UKRAINE
BAHAMAS FRANCE JAPAN
MEXICO CUBA CZECH REP. PAKISTAN
ITALY

the NY
Myanmar
JAMAICA
MEXICO HAITI PORTUGAL SPAIN INDIA MYANMAR
GUATEMALA
BELIZE DOMINICAN BARBADOS Crotia MIDDLE
EL SALVADOR EAST
NICARAGUA REPUBLIC TRINIDAD/TOBAGO ALGERIABelguim BALKANS
HONDURAS Russia Fed TUNISIA
Austria LIBYA
GUATEMALA
GUATEMALA COSTA RICA VENEZUELA Poland Ukraine BANGLADESH
KazakhstanSwitzerland VIETNAM
BARBADOS

Bosnia/
VENEZUELA BARBADOS Moldova
PANAMA SURINAME France

Herz
EL SALVADOR Norway SUDAN EGYPT
NICARAGUA TRINIDAD, TOBAGO
GUYANAMAURITANIA

Slovenia
NICARAGUA COLOMBIA GUYANA Finland
Romania PHILIPPINES
Hungary Albania Yugo
COSTA RICA SURINAME England Spain Greece
ECUADOR Ireland Italy
Portugal Germany Bulgaria UGANDA THAILAND LAOS
PANAMA MOROCCO MALI
COLOMBIA PERU
NIGERIA
Czech Rep
ECUADORBOLIVIA Latavia Lithuania
BRAZIL SURINAME

Times
SENEGAL
PERU PARAGUAY BURKINA CAMBODIA
CHILE GAMBIA FASO
BOLIVIA
BOLIVIA GUINEA-
BISSAU
ARGENTINA NIGER
CHILE
PARAGUAY GUINEA SRI LANKA PAPUA
NEW
CHILE SIERRA GHANA
MALAYSIA GUINEA
LEONE CAMEROON
CHAD
URUGUAY LIBERIA
URUGUAY SINGAPORE

BENIN ERITREA
ARGENTINA IVORY ETHIOPIA INDONESIA
COAST TOGO CENTRAL RWANDA
AFRICAN
DJIBOUTI
REPUBLIC

Highly Industrialized Countries AUSTRALIA


EACH DOT REPRESENTS
KENYA
Latin America and the Caribbean GABON
BURUNDI
2,500 PEOPLE LIVING
WITH AIDS NEW
CONGO ZEALAND
North Africa and the Middle East REP.
CONGO

Eastern Europe and Central Asia


Southern and Eastern Asia
MALAWI TANZANIA
Sub-Saharan Africa Sweeden

25 million
4 million
ZAMBIA
NEW CASES LIVING WITH AIDS
The estimated number of new The estimated number of
H.I.V./AIDS cases in highly people living with 20 million
MOZAMBIQUE
industrialized countries has 3 million H.I.V./AIDS has exploded
ANGOLA
decreased slightly since in sub-Saharan Africa
the 1980’s but has ZIMBABWE while staying relatively
continued growing in level in highly 15 million
sub-Saharan Africa. BOTSWANA industrialized countries.
SWAZILAND
2 million
NAMIBIA
10 million

SUB-SAHARAN AFRICA SOUTHERN AND SUB-SAHARAN AFRICA SOUTHERN AND


EASTERN ASIA EASTERN ASIA
1 million LESOTHO MADAGASCAR
5 million
HIGHLY INDUSTRIALIZED COUNTRIES
EASTERN EUROPE AND CENTRAL ASIA LATIN AMERICA AND THE CARIBBEAN
HIGHLY INDUSTRIALIZED COUNTRIES
LATIN AMERICA AND THE CARIBBEAN
EASTERN EUROPE AND CENTRAL ASIA
0 NORTH AFRICA AND THE MIDDLE EAST
SOUTH 0 NORTH AFRICA AND THE MIDDLE EAST
’80 ’82 ’84 ’86 ’88 ’90 ’92 ’94 ’96 ’98 ’00* AFRICA ’80 ’82 ’84 ’86 ’88 ’90 ’92 ’94 ’96 ’98 ’00*
* Preliminary numbers Source: *UNAIDS
Preliminary numbers

Source: UNAIDS

S-ar putea să vă placă și