Sunteți pe pagina 1din 45

Basic statistics for

economists
Spring 2018 C-D
Department of Statistics
NCT 1-2; (16.1)

Today
Describe data (variables) by means of
● Tables
– Distributions, frequency distributions (frequency = number of)
● Diagrams, graphics
– One variable at a time, univariate
– Categorical and numerical variables
● Numerically
– Location (“where are the values typically located”)
– Variation, dispersion (“how spread out are they”)

20 March 2018 Michael Carlson, Dep. of Statistics 2


From L1

Classification of variables

Variables

Categorical Numerical
(qualitative) (quantitative)

Discrete Continuous

20 March 2018 Michael Carlson, Dep. of Statistics 3


From L1

Scale level
Differences and ratios
are well-defined, true Ratio
zero exists
Numerical,
quantitative data
Differences are well-
defined but not ratios, Interval
true zero does not exist

Ordered categories
(ranking order) Ordinal

Categorical,
qualitative data

Categories but no natural Nominal


ordering exists

20 March 2018 Michael Carlson, Dep. of Statistics 4


Frequency distribution – one variable
● Number of objects/observations that share the same property

𝒏𝒏𝒌𝒌 = no. of objects with property 𝒌𝒌

● The entire set of all 𝒏𝒏𝒌𝒌 over all possible 𝒌𝒌 is called the
frequency distribution (sv. frekvensfördelningen)

● If there are in total 𝒏𝒏 objects and in total 𝑪𝑪 different categories


we have
∑𝑘𝑘 𝑛𝑛𝑘𝑘 = 𝑛𝑛1 + 𝑛𝑛2 + ⋯ + 𝑛𝑛𝐶𝐶 = 𝑛𝑛 Works for categorical and
discrete numerical
variables – but how do we
𝒏𝒏𝒌𝒌 deal with continuous
● Relative frequencies (%): 𝟏𝟏𝟏𝟏𝟏𝟏 ∙
𝒏𝒏 variables?

20 March 2018 Michael Carlson, Dep. of Statistics 5


Frequency tables – count the numbers
● Categorical – nominal or ordinal
● Numerical – discrete or class-divided continuous

Count the number that fall into each defined category:

Flavor Frequency Relative frequency %


Chocolate 70 35,0 % Largest first!
Vanilla 50 𝑛𝑛𝑘𝑘 25,0 %
Nominal 𝑛𝑛𝑘𝑘 100 ∙
Strawberry 45 𝑛𝑛 22,5 % Pareto, NCT s. 32-33
scale
Raspberry 30 15,0 %
Licorice 5 2,5 % Smallest last!
Sum 200 100 %

Note! Frequencies are always numerical but


the variable is not necessarily numerical!

20 March 2018 Michael Carlson, Dep. of Statistics 6


Frequency tables, cont.
Grade Frequency Relative frequency %
A 30 15,0 %
Arrange in order
B 56 28,0 % of ranking!
Categorical
ordinal scale C 80 40,0 %
Pareto not
D 20 10,0 % recommended!
E 14 7,0 %
Sum 200 100 %

No. of points 0 1 2 3 4 Sum


Discrete
Frequency 7 42 98 63 70 280
ratio scale
Relative frequency % 2,5 15,0 35,0 22,5 25,0 100

Arrange in order of numerical magnitude!


Pareto not recommended!

20 March 2018 Michael Carlson, Dep. of Statistics 7


Class separated continuous variable
● Continuous numerical variables (or discrete with many values)
may be grouped into classes, bins – i.e. intervals

● Separation into classes, categorization (sv. klassindelning)

● Classes and class widths must be defined


ex. (0-4,99) (5-9,99) (10-19,99) (20 - ) ⟵ (open class, ≥20)

● Same class width or varying width? What does NCT say?

● Summarize in a table - ordered by magnitude of values of the


class separated continuous variable

20 March 2018 Michael Carlson, Dep. of Statistics 8


Class separated continuous variable
● Cumulative – indicates the total number of observations whose
values are (e.g.) less than the upper limit of each class

Income per Relative Cumulative Cumulative


Frequency
month, tkr freq. freq. relative freq.
< 20 20 10,0 % 20 10,0 %
21 – 40 40 20,0 % 60 30,0 % Accumulated
41 – 60 74 37,0 % 134 67,0 % relative
frequencies
8 bins

61 – 80 40 20,0 % 174 87,0 %


81 – 100 18 9,0 % 192 96,0 %
101 – 120 6 3,0 % 198 99,0 %
121 – 140 2 1,0 % 200 100,0 %
> 140 0 0,0 % 200 100,0 %
Summa 200 100 %

20 March 2018 Michael Carlson, Dep. of Statistics 9


Graphical presentation – one variable
Frequencies (absolute or relative)
● Bar charts (sv. stapeldiagram)
– categorical, nominal and ordinal; discrete numerical
– ordered in the same way as we did for freq. tables
● Pie charts
– categorical, nominal
● Bar charts, (separated discrete bars, lines) (sv. stolpdiagram)
– discrete numerical with few values
● Histogram (adjacent bars, intervals)
– class separated continuous variable, discrete many values

20 March 2018 Michael Carlson, Dep. of Statistics 10


Diagram types – qualitative, categorical

Bar charts Pie charts

Nominal scale Nominal scale


50
40 Hallon

30 Blåbär
Pareto Choklad
20
ordered
10 Vanilj
0 Lakrits
Hallon Blåbär Choklad Vanilj Lakrits

Ordinal scale Start at 12 o’clock and


40 move clockwise, starting
30 with the largest, second
Not Pareto 20 largest and so on …
ordered
10
0
Mycket Dåligt OK Bra Mycket
dåligt bra

20 March 2018 Michael Carlson, Dep. of Statistics 11


Diagram types - numerical discrete

Bar chart (sv. stolp- el. stapeldiagram)

Ratio scale (also interval scale)


50

40

No. of apartments 30
- frequencies 20

10

0
1 2 3 4 5 6
No. of rooms per apartment

Ordered by magnitude

20 March 2018 Michael Carlson, Dep. of Statistics 12


Histogram - numerical continuous
● Histogram are used for continuous variables

● Class widths are defined (bins)

● Inclusive and non-overlapping – each observation belongs to


one class

● The frequency of a class is represented by area of the bar not


its height (se NCT sid 52-53)
– however, if class widths are the same for all bins the heights
are proportional to the areas and thus the frequencies

● Open classes (e.g. >65) are indicated with e.g. dotted lines
– we don’t know where it ends and thus neither the area!

20 March 2018 Michael Carlson, Dep. of Statistics 13


Histogram

Equal class widths Unequal class widths


80
70
60
(not possible with Excel)

50
Frekvens

40
30 Frekvens

20
10
0

Height is proportional to the area By Qwfp at English Wikipedia, CC BY-SA 3.0,


https://commons.wikimedia.org/w/index.php?curid=20290683

20 March 2018 Michael Carlson, Dep. of Statistics 14


Cumulative relative frequency - Ogive
Histogram
80 120%
Note! Different axis scales!
70
100%
- frequency on the left side
60 - % on the right side
80%
50
Frequency

40 60%
Frekvens
30 Kumulativa procenttal
40%
20

10
20% Note! The red ogive is an
increasing function (never
0 0% decreasing) and never goes
higher than 100 %

20 March 2018 Michael Carlson, Dep. of Statistics 15


Cumulative rel. freq. – Step function
● The increase at each step is equal to the height of the
corresponding stack in the histogram

40,0% 120,0%

35,0% height = A
100,0%
30,0%
80,0%
25,0%
increase = B
20,0% height = B 60,0%

15,0%
40,0%
increase = A
10,0%
20,0%
5,0%

0,0% 0,0%
0-20 21 – 40 41 – 60 61 – 80 81 – 100 101 – 120 121 – 140 > 140 0 20 40 60 80 100 120 140 160

Increases (never decreases)


and never goes above 100 %

20 March 2018 Michael Carlson, Dep. of Statistics 16


Quick summary:
What type of diagram do we use to show a frequency distribution?

● Categorical, nominal Bar chart, Pareto-ordered; Pie chart

● Categorical, ordinal Bar chart, ordered by rank (lowest - highest)

● Numerical, discrete, few values Bar chart, one bar for each discrete value

● Numerical, discrete, many values Histogram, divided into classes


(approximates continuous)
● Numerical, continuous Histogram , divided into classes

● Visualizing a cumulative frequency distribution?


Ogive or Step function (discrete and continuous)

20 March 2018 Michael Carlson, Dep. of Statistics 17


Skip this!

Stem-and-Leaf Displays
● sv. stambladdiagram
● Provides exact values and visualizes the 8 8
distribution 7 3
● In this example 6 3

– stems = tens (10, 20, …) 8 5 2


– leafs = ones (0, 1, 2,…, 9) 6 5 2
5 4 1
● Not very common these days, before the
6 3 3 0 1
era of graphical printing
1 2 3 4 5

20 March 2018 Michael Carlson, Dep. of Statistics 18


NCT 16.1

Time series, observations over time


● Line chart, time series plot
– visualizes changes over time
– time on the x-axis, observed values on the y-axis
– points are connected
4 Price
3,5 Year Milk Soda pop
1975 1,34 1,20
3
1976 1,43 1,33
2,5 1977 1,63 1,38
2 Mjölk 1978 1,92 1,46
Sockerdricka
1979 2,07 1,51
1,5 1980 2,41 1,66
1 1981 2,87 1,84
1982 3,38 2,01
0,5

0
1975 1976 1977 1978 1979 1980 1981 1982

20 March 2018 Michael Carlson, Dep. of Statistics 19


Numerical measures - summaries
● Define numerical measures that summarize the most important
properties of a set of observations

● Location – where are the observations?


– Around 4, 25, 100 or 10 000?
– Measures of location, central tendency

● Dispersion – how spread out are the observations from the


central location?
– about 2-8 or 4-500? Or in the interval 100±20?
– Many close to the center? Or in the “tails” i.e. endpoints?
– Measures of variability

20 March 2018 Michael Carlson, Dep. of Statistics 20


The data and notation

● Denote the variable by 𝒙𝒙 (or some other letter 𝑦𝑦, 𝑧𝑧, 𝑢𝑢, 𝑣𝑣, …)

● 𝒏𝒏 observations, sample size (𝑵𝑵 population size)

● Denote by indexing with 𝒊𝒊 = 𝟏𝟏, 𝟐𝟐, 𝟑𝟑, … , 𝒏𝒏 − 𝟏𝟏, 𝒏𝒏 (labels)

● Value of the 𝒊𝒊th observation of the variable 𝒙𝒙 is denoted by 𝒙𝒙𝒊𝒊

● The entire set of observed values may be denoted as

{𝒙𝒙𝟏𝟏 , 𝒙𝒙𝟐𝟐 , 𝒙𝒙𝟑𝟑 , … , 𝒙𝒙𝒏𝒏−𝟏𝟏 , 𝒙𝒙𝒏𝒏 }

20 March 2018 Michael Carlson, Dep. of Statistics 21


Measure of Location: Mean
● Arithmetic sample mean (sv. medelvärde)
𝑛𝑛
● Sum all and divide by the number 1
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖
𝑛𝑛
● 𝒙𝒙-bar 𝑖𝑖=1

● Mean value is sensitive of extreme values:

ex. {2, 3, 4, 5} ⇒ 𝑥𝑥̅ = 3,5


ex. {2, 3, 4, 25} ⇒ 𝑥𝑥̅ = 8,5
ex. {22, 23, 24, 25} ⇒ 𝑥𝑥̅ = 23,5
𝑁𝑁
1
● Population mean often denoted 𝝁𝝁 or 𝝁𝝁𝒙𝒙 𝜇𝜇 = � 𝑥𝑥𝑖𝑖
𝑁𝑁
𝑖𝑖=1
(”mu”, or. Sv. ”my”)

20 March 2018 Michael Carlson, Dep. of Statistics 22


More on means – grouped values
● 𝒏𝒏 observations, distributed such that we have 𝒏𝒏𝒌𝒌 of them sharing
the same value 𝒙𝒙 = 𝒌𝒌
● E.g. 𝒏𝒏𝟎𝟎 zeroes (𝒙𝒙 = 𝟎𝟎), 𝒏𝒏𝟏𝟏 ones (𝒙𝒙 = 𝟏𝟏), 𝒏𝒏𝟐𝟐 twos (𝒙𝒙 = 𝟐𝟐), … etc. up
to 𝒏𝒏𝑪𝑪 with value 𝒙𝒙 = 𝑪𝑪
Proportion
𝑛𝑛 𝐶𝐶 𝐶𝐶
1 1 𝑛𝑛𝑘𝑘 with value k
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖 = � 𝑛𝑛𝑘𝑘 ∙ 𝑘𝑘 = � ∙ 𝑘𝑘
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑘𝑘=0 𝑘𝑘=0

● Ex. 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3 yields
4
𝑛𝑛𝑘𝑘 1 20 5
𝑥𝑥̅ = � 𝑘𝑘 = 3∙0+2∙1+3∙2+4∙3 = = = 1,6667
𝑛𝑛 12 12 3
𝑘𝑘=0

20 March 2018 Michael Carlson, Dep. of Statistics 23


Not required but
- useful to know
Geometric mean
● Monthly interest on an investment over 6 months:
10%, 7%, -2%, 11%, 8%, 4%

● Total growth over the entire period is:


1,10 · 1,07 · 0,98 · 1,11 · 1,08 · 1,04 = 1,4381
i.e. +43,81%

● Geometric mean: 𝑥𝑥̅𝑔𝑔 = 𝑛𝑛


𝑥𝑥1 𝑥𝑥2 ⋯ 𝑥𝑥3
6
● Here 𝑥𝑥̅𝑔𝑔 = 1,4381 = 1,0624

i.e. average interest per month = 6,24%

20 March 2018 Michael Carlson, Dep. of Statistics 24


Location, central tendency: Median
● The median separates a numerical dataset in half
● 50% of the observations lie on either side of the median

● Arrange the observations in increasing order, smallest-largest


n even ⇒ median = mean of the two in the middle
n odd ⇒ median = the middle value

ex. {2, 3, 4, 5} ⇒ median = 3,5


Less sensitive to
ex. {2, 3, 4, 25} ⇒ median = 3,5 extreme values

ex. {2, 3, 4, 25, 135} ⇒ median = 4

20 March 2018 Michael Carlson, Dep. of Statistics 25


Location: Mode
● Sv. typvärde
● The most frequently occurring value; largest frequency

ex. {4, 2, 3, 3, 5, 1, 3, 5} ⇒ Mode = 3 Unimodal

ex. {5, 2, 3, 3, 5, 1, 3, 5} ⇒ Mode = 3 and 5


Bimodal

● Useful for categorical variables (nominal and ordinal scales)

ex. {b, a, c, b, d, b, a, e} ⇒ Mode = b

20 March 2018 Michael Carlson, Dep. of Statistics 26


Variability: Range

● The size of the observed range of values, the interval within


which all observations lie

● Difference between the largest and smallest values

Range = Max – Min

● Sensitive of extreme values

20 March 2018 Michael Carlson, Dep. of Statistics 27


Variability : Quartiles – Quartile Range
Arrange the observations in increasing order:

● Q1 = 1st quartile
25% of observations below, 75% above
● Q3 = 3rd quartile
75 % of observations below, 25 % above

● IQR = Inter Quartile Range = Q3 – Q1


(sv. kvartilavstånd)

● 50 % of the observations lie in an interval that is IQR wide

20 March 2018 Michael Carlson, Dep. of Statistics 28


See NCT p. 64-67
Formulary p. 2
Percentiles
Let 𝒙𝒙(𝟏𝟏) , 𝒙𝒙(𝟐𝟐) , … , 𝒙𝒙(𝒏𝒏) denote the ordered sample, ordered by size from
the smallest value 𝒙𝒙(𝟏𝟏) to the largest 𝒙𝒙(𝒏𝒏)
𝑝𝑝
● Let 𝒂𝒂 = integer part of (𝑛𝑛 + 1)
100
𝑝𝑝
● Låt 𝒃𝒃 = decimal part of (𝑛𝑛 + 1)
100

● 𝒑𝒑th percentile = 𝑥𝑥(𝒂𝒂) + 𝒃𝒃 ∙ (𝑥𝑥 𝒂𝒂+1 − 𝑥𝑥 𝒂𝒂 )

Ex. {11, 12, 14, 15, 17, 18, 20, 21, 21, 23, 30, 40}, 𝒏𝒏 = 𝟏𝟏𝟏𝟏
40
40th percentile: 𝑛𝑛 + 1 = 12 + 1 ∙ 0,4 = 𝟓𝟓, 𝟐𝟐 ⇒ 𝑎𝑎 = 𝟓𝟓 och 𝑏𝑏 = 𝟎𝟎, 𝟐𝟐
100

40th percentile = 𝑥𝑥(𝟓𝟓) + 𝟎𝟎, 𝟐𝟐 ∙ 𝑥𝑥 𝟓𝟓+1 − 𝑥𝑥 𝟓𝟓 = 17 + 0,2 ∙ 18 − 17 = 𝟏𝟏𝟏𝟏, 𝟐𝟐

20 March 2018 Michael Carlson, Dep. of Statistics 29


Median - example

● n = 12 observations ordered by size:


{11, 12, 14, 15, 17, 18, 20, 21, 21, 23, 30 och 40}

● Start with (n+1) = 13 0,5 because it’s half

● (n+1)·0,5 = 6,5 i.e. in between the 6th and 7th

● median = md = P50 = (18+20)/2 = 19

= mean value of the 6th and the 7th observations

20 March 2018 Michael Carlson, Dep. of Statistics 30


Quartiles - example
● 1st quartile 25th percentile Δ = 15-14 = 1
(n+1)·0,25 = 3,25 a = 3, b = 0,25
I↓ I
Q1 = 14 + 0,25·1 = 14,25 14 15
between 3rd and 4th
● 3:e quartile 75th percentile
(n+1)·0,75 = 9,75 Δ = 23-21 = 2
Q3 = 21 + 0,75·2 = 22,50
a = 9, b = 0,75
I I ↓ I
21 23
● IQR = Q3 – Q1 = 8,25 between 9th and 10th

20 March 2018 Michael Carlson, Dep. of Statistics 31


With Excel
● Same data used in the example are in the cells A1–A12
● Write the following functions in any empty cell:

=MIN(A1:A12)
=QUARTILE.EXC (A1:A12;1)
=MEDIAN(A1:A12)
=QUARTILE.EXC (A1:A12;3)
=MAX(A1:A12)

English and Swedish versions of Excel functions, see e.g.


http://www.exceldepartment.com/excelkurs/extramaterial/excelfunktioner-svenska-engelska/

20 March 2018 Michael Carlson, Dep. of Statistics 32


Box-and Whisker plots – visual summary
● We need:
– smallest and largeset values - min and max
– median, 1st and 3rd quartiles - Md, Q1 and Q3

”Five-number summary” NCT p. 65

Definition of extreme values (according to Tukey):


● Outliers: values that lie more than 1,5⨯IQR below Q1 or above
Q3
● Extreme outliers: 3⨯IQR

20 March 2018 Michael Carlson, Dep. of Statistics 33


Box plots, cont.
”Five-number summary”

Median Max
Extreme
Min Q1 Q3 values

20 30 40 50 60 70 80

1,5 × IQR IQR 1,5 × IQR

20 March 2018 Michael Carlson, Dep. of Statistics 34


NCT p. 73-74

Variability: Variance
● Average squared distance to the mean

● Sample and population variances:


𝑛𝑛 𝑁𝑁
2
1 1
𝑠𝑠𝑥𝑥 = �(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )2 2
𝜎𝜎𝑥𝑥 = �(𝑥𝑥𝑖𝑖 −𝜇𝜇)2 (”sigma”)
𝑛𝑛 − 1 𝑁𝑁
𝑖𝑖=1 𝑖𝑖=1
– Note! For samples, divide by 𝒏𝒏 − 𝟏𝟏 rather than 𝒏𝒏
– Unit of measurement is transformed to square units

● Standard deviation:
– Restores unit of measurement 𝑠𝑠𝑥𝑥 = 𝑠𝑠𝑥𝑥2 𝜎𝜎𝑥𝑥 = 𝜎𝜎𝑥𝑥2
– sv. standardavvikelse

● Coefficient of Variation: read on you own in NCT p. 75

20 March 2018 Michael Carlson, Dep. of Statistics 35


Variance – alternative formulas
● Sample variance

∑𝑛𝑛 2 ∑𝑛𝑛 2 2 Shortcut formula


𝑖𝑖=1 (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) 𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑛𝑛𝑥𝑥̅
𝑠𝑠𝑥𝑥2 = = sometimes easier to use
𝑛𝑛 − 1 𝑛𝑛 − 1

Excel: ’=VAR.S(…)’
● Population variance

∑𝑁𝑁 2 ∑𝑁𝑁 2 2
𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝜇𝜇) 𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑁𝑁𝜇𝜇
𝜎𝜎𝑥𝑥2 = =
𝑁𝑁 𝑁𝑁

Excel: ’=VAR.P(…)’

20 March 2018 Michael Carlson, Dep. of Statistics 36


Variance
● Four observations {2, 3, 5, 8}; mean 𝑥𝑥̅ = 4,5
● Distance to mean 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ , square them and sum:

4
𝑥𝑥𝑖𝑖 2 3 5 8 18 3,5
3
𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ -2,5 -1,5 0,5 3,5 0 2,5
2
2 21
𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 6,25 2,25 0,25 12,25 1,5
2,5
1
𝑥𝑥𝑖𝑖2 4 9 25 64 102 0,5
0
0 1 2 3 4 5 6 7 8 9

● Calculate the variance:


2,5
– divide by 𝑛𝑛 − 1 = 3 or 𝑁𝑁 = 4?
∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2 𝟐𝟐𝟐𝟐 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 − 𝑛𝑛𝑥𝑥̅ 2 𝟏𝟏𝟏𝟏𝟏𝟏 − 4 ∙ 4,52 21
𝑠𝑠𝑥𝑥2 = = =7 𝑠𝑠𝑥𝑥2 = = = =7
𝑛𝑛 − 1 3 𝑛𝑛 − 1 4−1 3

20 March 2018 Michael Carlson, Dep. of Statistics 37


Quick quiz:
Which of the following statements about variances are true?

1. Variances are always positive False, can be zero

2. Variances can never be negative True

3. Variances can never equal zero False

4. If you add 10 to every observation in the data, the variance


will be unaffected True

5. If you multiply every observation with 10, the variance will be


100 time larger True

20 March 2018 Michael Carlson, Dep. of Statistics 38


NCT p. 75-76

Chebyshev’s theorem and the Empirical rule


● Provides a description of how spread out our observations that
relates to the standard deviation (variance):

Rule μ±σ μ ± 2σ μ ± 3σ

Chebyshev: 0% 75 % 88,89 % Guaranteed

Under some
Empirical: ca 68 % ca 95 % ca 100 % conditions

”bellshaped”

● Compare to Q1, Q3 and IQR


95%

μ ± 2σ

20 March 2018 Michael Carlson, Dep. of Statistics 39


NCT p. 91

Skewness – sv. snedhet - useful knowledge

● If the distribution looks as if it has been “pulled out” to one


side, we say the distribution is skewed (sv. sned)
● Symmetric if it equally distributed on both sides (non-skewed)

Left skewed Symmetric Right skewed


Medel ≠ Median ≠ Typ Medel = Median = Typ Typ ≠ Median ≠ Medel

20 March 2018 Michael Carlson, Dep. of Statistics 40


Variable type & descriptive measures

Variables

Categorical Numerical

Nominal- and ordinal scale


Location: mode (typvärde)
Discrete Continuous
Variability: no. of categories/levels

Interval and ratio scale


Location: mode, median (Q1 & Q3), mean
Variability: range, IQR, variance & standard deviation

20 March 2018 Michael Carlson, Dep. of Statistics 41


Next time …
… we’ll continue with descriptive statistics and discuss how to
describe and study two variables:
● Tables and graphs etc.

Especially the relationship between two numerical variables


● Graphically
– scatter plots
● Measures of Relationships between two variables:
– covariance and correlation coefficient

20 March 2018 Michael Carlson, Dep. of Statistics 42


Exercise: DIY

𝑖𝑖 1 2 3 4 5 6 7 8 9 10 ∑𝑖𝑖
𝑥𝑥𝑖𝑖 5 2 3 6 5 2 5 3 5 4 40

1 1 40
Mean: 𝑥𝑥̅ = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 = 5 + 2 + ⋯+ 4 = = 4,0
𝑛𝑛 10 10

𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 1 −2 −1 2 1 −2 1 −1 1 0 0
𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 2 1 4 1 4 1 4 1 1 1 0 18

1 1 18
Variance: 𝑠𝑠𝑥𝑥2 = ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )2 = 1 + 4 + 1 + ⋯+ 0 = = 2,0
𝑛𝑛−1 9 9

Standard deviation: 𝑠𝑠𝑥𝑥 = 𝑠𝑠𝑥𝑥2 = 2,0 = 1,4142 …

20 March 2018 Michael Carlson, Dep. of Statistics 43


Exercise: DIY, cont.

𝑖𝑖 1 2 3 4 5 6 7 8 9 10 ∑𝑖𝑖
𝑥𝑥𝑖𝑖 5 2 3 6 5 2 5 3 5 4 40

1 1 40
Mean: 𝑥𝑥̅ = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 = 2 + 2 + ⋯+ 6 = = 4,0
𝑛𝑛 10 10

𝑥𝑥𝑖𝑖2 25 4 9 36 25 4 25 9 25 16 178

alt. formula:
1 1 178−160
Variance: 𝑠𝑠𝑥𝑥2 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 − 𝑛𝑛𝑥𝑥̅ 2 = 178 − 10 ∙ 42 = = 2,0
𝑛𝑛−1 9 9

Mode = 5 Range = 𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑀𝑀𝑀𝑀𝑀𝑀 = 6 − 2 = 4

20 March 2018 Michael Carlson, Dep. of Statistics 44


𝑛𝑛 = 10 ⇒ 𝑛𝑛 + 1 = 11
Exercise: DIY, cont.

(𝑖𝑖) 1 2 3 4 5 6 7 8 9 10 ∑𝑖𝑖
𝑥𝑥(𝑖𝑖) 2 2 3 3 4 5 5 5 5 6 40

Median: 50% of 𝒏𝒏 + 𝟏𝟏 = 5,5 ⇒ 𝑎𝑎 = 𝟓𝟓 𝑏𝑏 = 𝟎𝟎, 𝟓𝟓

𝑥𝑥(𝟓𝟓) + 𝟎𝟎, 𝟓𝟓 ∙ 𝑥𝑥 𝟔𝟔 − 𝑥𝑥 𝟓𝟓 = 4 + 𝟎𝟎, 𝟓𝟓 5 − 4 = 4,5

Q1: 25% of 𝒏𝒏 + 𝟏𝟏 = 2,75 ⇒ 𝑎𝑎 = 𝟐𝟐 𝑏𝑏 = 𝟎𝟎, 𝟕𝟕𝟕𝟕

𝑥𝑥(𝟐𝟐) + 𝟎𝟎, 𝟕𝟕𝟕𝟕 ∙ 𝑥𝑥 𝟑𝟑 − 𝑥𝑥 𝟐𝟐 = 2 + 𝟎𝟎, 𝟕𝟕𝟕𝟕 3 − 2 = 2,75

Q3: 75% of 𝒏𝒏 + 𝟏𝟏 = 8,25 ⇒ 𝑎𝑎 = 𝟖𝟖 𝑏𝑏 = 𝟎𝟎, 𝟐𝟐𝟐𝟐

𝑥𝑥(𝟖𝟖) + 𝟎𝟎, 𝟐𝟐𝟐𝟐 ∙ 𝑥𝑥 𝟗𝟗 − 𝑥𝑥 𝟖𝟖 = 5 + 𝟎𝟎, 𝟐𝟐𝟐𝟐 5 − 5 = 5

20 March 2018 Michael Carlson, Dep. of Statistics 45

S-ar putea să vă placă și