Documente Academic
Documente Profesional
Documente Cultură
C.Cordeiro
1 / 38
Motivation
Data Example
C.Cordeiro
2 / 38
Motivation
Data Example
C.Cordeiro
3 / 38
Motivation
Data Example
18
14
16
20
22
1990
C.Cordeiro
1995
2000
2005
2010
2015
4 / 38
Des
riptive Statisti
s: Des
ribing data using graphs and numeri
al
summaries, su
h as, mean, median, interquartile range, et
. In general,
the purpose is to dis
over important patterns in the data and to
display these patterns.
C.Cordeiro
5 / 38
Des
riptive Statisti
s: Des
ribing data using graphs and numeri
al
summaries, su
h as, mean, median, interquartile range, et
. In general,
the purpose is to dis
over important patterns in the data and to
display these patterns.
C.Cordeiro
5 / 38
C.Cordeiro
6 / 38
C.Cordeiro
6 / 38
weigth
related to
is levels of
C.Cordeiro
CO2
heigth;
related to in rease of
SST ;
6 / 38
weigth
related to
is levels of
CO2
heigth;
related to in rease of
SST ;
C.Cordeiro
individually .
6 / 38
C.Cordeiro
7 / 38
C.Cordeiro
7 / 38
C.Cordeiro
7 / 38
C.Cordeiro
7 / 38
C.Cordeiro
Y,
with
Y;
7 / 38
Y,
with
Y;
C.Cordeiro
7 / 38
C.Cordeiro
8 / 38
Types of variables
Dis
rete (dis
retas):
an only take
ertain, usually integer, values, e.g.
the numbers of plants in a forest plot, the number of
ells in a tissue
se
tion, the number of organisms in a sample of mud from a lo
al
estuary, et
.
C.Cordeiro
9 / 38
Types of variables
Dis
rete (dis
retas):
an only take
ertain, usually integer, values, e.g.
the numbers of plants in a forest plot, the number of
ells in a tissue
se
tion, the number of organisms in a sample of mud from a lo
al
estuary, et
.
C.Cordeiro
9 / 38
Types of variables
Dis
rete (dis
retas):
an only take
ertain, usually integer, values, e.g.
the numbers of plants in a forest plot, the number of
ells in a tissue
se
tion, the number of organisms in a sample of mud from a lo
al
estuary, et
.
C.Cordeiro
9 / 38
Types of variables
Qualitative data are data that re ords ategories, e.g. gender (male, female).
C.Cordeiro
10 / 38
Types of variables
Qualitative data are data that re ords ategories, e.g. gender (male, female).
There are
two levels
of measurement:
C.Cordeiro
10 / 38
Types of variables
Qualitative data are data that re ords ategories, e.g. gender (male, female).
There are
two levels
of measurement:
C.Cordeiro
10 / 38
Types of variables
Example
C.Cordeiro
1 17
50
male
2 16
59
male
3 15
49
female
4 16
51
male
11 / 38
Types of variables
Example
50
male
2 16
59
male
3 15
49
female
4 16
51
male
C.Cordeiro
11 / 38
Types of variables
Example
50
male
2 16
59
male
3 15
49
female
4 16
51
male
C.Cordeiro
11 / 38
Types of variables
Let's work!
a)
b)
)
d)
e)
f)
g)
h)
i)
j)
k)
l)
a) Quantas variveis?
b) Identique e
lassique as variveis.
C.Cordeiro
12 / 38
Types of variables
Let's work!
Considere os dados:
lynx
hi
kwts
PlantGrowth
Inse
tSprays
faithful
ToothGrowth
C.Cordeiro
13 / 38
Numeri summaries
is important .
C.Cordeiro
14 / 38
Numeri summaries
Measures of Lo ation
Center measures
x =
C.Cordeiro
Pn
i = 1 xi
15 / 38
Numeri summaries
Measures of Lo ation
Center measures
x =
Pn
i = 1 xi
n
is the value for whi
h as many
C.Cordeiro
15 / 38
Numeri summaries
Measures of Lo ation
Center measures
x =
Pn
i = 1 xi
n
is the value for whi
h as many
C.Cordeiro
15 / 38
Numeri summaries
Measures of Lo ation
Center measures
x =
Pn
i = 1 xi
n
is the value for whi
h as many
C.Cordeiro
p)
15 / 38
Numeri summaries
Let's work!
a) whale = c(74, 122, 235, 111, 292, 111, 211, 133, 156, 79).
b) penguin =
c(17.1, 18.5, 19.7, 16.2, 21.3, 19.6, 16.2, 17.4, 17.3, 16.8, 19.5, 18.3).
4
a)
b)
)
d)
e)
f)
C.Cordeiro
16 / 38
Numeri summaries
Let's work!
C.Cordeiro
17 / 38
Numeri summaries
Let's work!
A varivel
time
18 / 38
Numeri summaries
C.Cordeiro
19 / 38
Numeri summaries
S =
n1
S2
n
X
i =1
(xi x)2
This is not quite the average of the squared distan
es, as we have
divided by
n1
and not
n.
interpretation.
Values far from the mean will have big deviations whi
h when squared
will be even bigger.
So more spread-out data sets will have larger varian
es.
C.Cordeiro
19 / 38
Numeri summaries
If our data has units, say kg, then the sample varian e will be in
S=
variance = S 2
C.Cordeiro
scale is appropriate !
20 / 38
Numeri summaries
CV =
C.Cordeiro
s
100%
x
21 / 38
Numeri summaries
CV =
s
100%
x
different units ,
that is,
C.Cordeiro
21 / 38
Numeri summaries
CV =
s
100%
x
different units ,
that is,
different means ;
C.Cordeiro
21 / 38
Numeri summaries
CV =
s
100%
x
different units ,
that is,
different means ;
A variable with
higher
CV (in general,
> 50%)
is
more dispersed
than
C.Cordeiro
21 / 38
Numeri summaries
CV =
s
100%
x
different units ,
that is,
different means ;
A variable with
higher
CV (in general,
> 50%)
is
more dispersed
than
C.Cordeiro
positive values .
21 / 38
Numeri summaries
xi .
C.Cordeiro
22 / 38
Numeri summaries
Let's work!
ncases
representa o nmero
de asos de an ro no esfago.
C.Cordeiro
23 / 38
Numeri summaries
statistical inference .
When trying to understand a parent distribution from the data, we
dis
uss some assumptions about the exa
t shape of a distribution.
C.Cordeiro
24 / 38
Numeri summaries
statistical inference .
When trying to understand a parent distribution from the data, we
dis
uss some assumptions about the exa
t shape of a distribution.
In inferen
e statisti
s, a primary role is played by a parti
ular shapethe normal shape.
C.Cordeiro
24 / 38
Numeri summaries
statistical inference .
When trying to understand a parent distribution from the data, we
dis
uss some assumptions about the exa
t shape of a distribution.
In inferen
e statisti
s, a primary role is played by a parti
ular shapethe normal shape.
C.Cordeiro
24 / 38
Numeri summaries
C.Cordeiro
25 / 38
Numeri summaries
Kurtosis
Measures the atness (a
hatamento) of a distribution.
A normal distribution has a value of 3. A kurtosis >3 indi
ates a sharp peak
with heavy tails
loser to the mean (leptokurti
). A kurtosis < 3 indi
ates
the opposite a at top (platykurti
).
C.Cordeiro
26 / 38
Numeri summaries
Kurtosis
Measures the atness (a
hatamento) of a distribution.
A normal distribution has a value of 3. A kurtosis >3 indi
ates a sharp peak
with heavy tails
loser to the mean (leptokurti
). A kurtosis < 3 indi
ates
the opposite a at top (platykurti
).
C.Cordeiro
26 / 38
esoph,
obtain the
tabulated data.
C.Cordeiro
27 / 38
esoph,
obtain the
tabulated data.
Quantitative
Discrete variable :
frequn ias)
C.Cordeiro
27 / 38
esoph,
obtain the
tabulated data.
Quantitative
Discrete variable :
frequn ias)
Continuous variable :
classes/bins
100 is given by 2
bins ).
n.
babyboom
C.Cordeiro
27 / 38
babyboom,
obtain the
C.Cordeiro
28 / 38
babyboom,
obtain the
Quantitative
Discrete variable :
Continuous variable :
C.Cordeiro
28 / 38
Let's work!
10
Considere os dados
i)
ii)
iii)
iv)
v)
vi)
11
esoph,
C.Cordeiro
29 / 38
Graphi al tools
There are several graphi
al tools to view the shape of the data distribution:
Stem-and-leaf plot (
aule-e-folhas): The data set
an be represented
in an organized
ompa
t manner. A good option when analysing the
data set by hand.
Example: 2, 3, 16, 23, 14, 12, 4, 13, 2, 0, 0, 0, 6, 28, 31, 14, 4, 8, 2, 5
Histogram (histograma): Represent the data points with a bar of a
given area.
Boxplots (
aixa de bigodes): Is a graphi
al devi
e based on the
quartiles (Q1 ,
Q2 , Q3 ).
C.Cordeiro
30 / 38
Graphi al tools
Robust measures
standard deviation
mean
and the
but....
C.Cordeiro
31 / 38
Graphi al tools
Robust measures
standard deviation
mean
and the
but.... they suer when the data has long tails or many outliers.
C.Cordeiro
31 / 38
Graphi al tools
Robust measures
standard deviation
mean
and the
but.... they suer when the data has long tails or many outliers.
The
median
The
IQR
is su h resistant measure.
observations.
C.Cordeiro
31 / 38
Graphi al tools
Let's work!
12
C.Cordeiro
32 / 38
Graphi al tools
Let's work!
14
Considere os dados
C.Cordeiro
33 / 38
Graphi al tools
Bivariate data
two variables :
C.Cordeiro
agegp
and
esoph,
tobgp .
34 / 38
Graphi al tools
Bivariate data
two variables :
agegp
and
esoph,
tobgp .
C.Cordeiro
faithful .
34 / 38
15
Considere os dados
a)
b)
)
d)
e)
f)
C.Cordeiro
35 / 38
16
Os dados
normtemp
temperature ).
a) Identique e
lassique as variveis.
b) Faa um histograma
om a varivel temperature .
) Obtenha a temperatura mdia
orporal, e verique se esta medida
estatsti
a a adequada para os dados.
d) A varivel gender 1 para mas
ulino e 2 para feminino. Faa um gr
o
de
aixa de bigodes por gender . A
ha que as temperaturas
orporais so
semelhantes?
e) Usando a mdia, mediana, mximo e mnimo,
ompare os gneros
relativamente ao batimento
arda
o em des
anso hr .
f) Qual o gnero que registou o batimento
arda
o mais elevado? E o
menor? Indique os respe
tivos valores.
g) Classique a assimetria das duas variveis quantitativas atravs de uma
gr
o e indique tambm os seus valores.
h) Agrupe a varivel temperature em
lasses e represente-a gra
amente.
i) Apresente a tabela de
ontingn
ia para as duas variveis qualitativas.
C.Cordeiro
36 / 38
17
reaction.time
C.Cordeiro
37 / 38
Des ribing
C.Cordeiro
38 / 38