Basic Statistics For Economists Lecture 2

Basic statistics for
economists
Spring 2018 C-D
Department of Statistics
NCT 1-2; (16.1)
Today
Describe data (variables) by means of
● Tables
– Distributions, frequency distributions (frequency = number of)
● Diagrams, graphics
– One variable at a time, univariate
– Categorical and numerical variables
● Numerically
– Location (“where are the values typically located”)
– Variation, dispersion (“how spread out are they”)
20 March 2018 Michael Carlson, Dep. of Statistics 2

From L1
Classification of variables
Variables
Categorical Numerical
(qualitative) (quantitative)
Discrete Continuous

From L1
Scale level
Differences and ratios
are well-defined, true Ratio
zero exists
Numerical,
quantitative data
Differences are well-
defined but not ratios, Interval
true zero does not exist
Ordered categories
(ranking order) Ordinal
Categorical,
qualitative data
Categories but no natural Nominal

ordering exists

Frequency distribution – one variable
● Number of objects/observations that share the same property
𝒏𝒏𝒌𝒌 = no. of objects with property 𝒌𝒌
● The entire set of all 𝒏𝒏𝒌𝒌 over all possible 𝒌𝒌 is called the
frequency distribution (sv. frekvensfördelningen)
● If there are in total 𝒏𝒏 objects and in total 𝑪𝑪 different categories

we have
∑𝑘𝑘 𝑛𝑛𝑘𝑘 = 𝑛𝑛1 + 𝑛𝑛2 + ⋯ + 𝑛𝑛𝐶𝐶 = 𝑛𝑛 Works for categorical and
discrete numerical
variables – but how do we
𝒏𝒏𝒌𝒌 deal with continuous
● Relative frequencies (%): 𝟏𝟏𝟏𝟏𝟏𝟏 ∙
𝒏𝒏 variables?

Frequency tables – count the numbers
● Categorical – nominal or ordinal
● Numerical – discrete or class-divided continuous
Count the number that fall into each defined category:
Flavor Frequency Relative frequency %

Chocolate 70 35,0 % Largest first!
Vanilla 50 𝑛𝑛𝑘𝑘 25,0 %
Nominal 𝑛𝑛𝑘𝑘 100 ∙
Strawberry 45 𝑛𝑛 22,5 % Pareto, NCT s. 32-33
scale
Raspberry 30 15,0 %
Licorice 5 2,5 % Smallest last!
Sum 200 100 %
Note! Frequencies are always numerical but

the variable is not necessarily numerical!

Frequency tables, cont.
Grade Frequency Relative frequency %
A 30 15,0 %
Arrange in order
B 56 28,0 % of ranking!
Categorical
ordinal scale C 80 40,0 %
Pareto not
D 20 10,0 % recommended!
E 14 7,0 %
Sum 200 100 %
No. of points 0 1 2 3 4 Sum

Discrete
Frequency 7 42 98 63 70 280
ratio scale
Relative frequency % 2,5 15,0 35,0 22,5 25,0 100
Arrange in order of numerical magnitude!

Pareto not recommended!

Class separated continuous variable
● Continuous numerical variables (or discrete with many values)
may be grouped into classes, bins – i.e. intervals
● Separation into classes, categorization (sv. klassindelning)
● Classes and class widths must be defined

ex. (0-4,99) (5-9,99) (10-19,99) (20 - ) ⟵ (open class, ≥20)
● Same class width or varying width? What does NCT say?
● Summarize in a table - ordered by magnitude of values of the

class separated continuous variable

Class separated continuous variable
● Cumulative – indicates the total number of observations whose
values are (e.g.) less than the upper limit of each class
Income per Relative Cumulative Cumulative

Frequency
month, tkr freq. freq. relative freq.
< 20 20 10,0 % 20 10,0 %
21 – 40 40 20,0 % 60 30,0 % Accumulated
41 – 60 74 37,0 % 134 67,0 % relative
frequencies
8 bins
61 – 80 40 20,0 % 174 87,0 %

81 – 100 18 9,0 % 192 96,0 %
101 – 120 6 3,0 % 198 99,0 %
121 – 140 2 1,0 % 200 100,0 %
> 140 0 0,0 % 200 100,0 %
Summa 200 100 %

Graphical presentation – one variable
Frequencies (absolute or relative)
● Bar charts (sv. stapeldiagram)
– categorical, nominal and ordinal; discrete numerical
– ordered in the same way as we did for freq. tables
● Pie charts
– categorical, nominal
● Bar charts, (separated discrete bars, lines) (sv. stolpdiagram)
– discrete numerical with few values
● Histogram (adjacent bars, intervals)
– class separated continuous variable, discrete many values

Diagram types – qualitative, categorical
Bar charts Pie charts
Nominal scale Nominal scale

50
40 Hallon
30 Blåbär
Pareto Choklad
20
ordered
10 Vanilj
0 Lakrits
Hallon Blåbär Choklad Vanilj Lakrits
Ordinal scale Start at 12 o’clock and

40 move clockwise, starting
30 with the largest, second
Not Pareto 20 largest and so on …
ordered
10
0
Mycket Dåligt OK Bra Mycket
dåligt bra

Diagram types - numerical discrete
Bar chart (sv. stolp- el. stapeldiagram)
Ratio scale (also interval scale)

50
40
No. of apartments 30
- frequencies 20
10
0
1 2 3 4 5 6
No. of rooms per apartment
Ordered by magnitude

Histogram - numerical continuous
● Histogram are used for continuous variables
● Class widths are defined (bins)
● Inclusive and non-overlapping – each observation belongs to

one class
● The frequency of a class is represented by area of the bar not

its height (se NCT sid 52-53)
– however, if class widths are the same for all bins the heights
are proportional to the areas and thus the frequencies
● Open classes (e.g. >65) are indicated with e.g. dotted lines
– we don’t know where it ends and thus neither the area!

Histogram
Equal class widths Unequal class widths

80
70
60
(not possible with Excel)
50
Frekvens
40
30 Frekvens
20
10
0
Height is proportional to the area By Qwfp at English Wikipedia, CC BY-SA 3.0,

https://commons.wikimedia.org/w/index.php?curid=20290683

Cumulative relative frequency - Ogive
Histogram
80 120%
Note! Different axis scales!
70
100%
- frequency on the left side
60 - % on the right side
80%
50
Frequency
40 60%
Frekvens
30 Kumulativa procenttal
40%
20
10
20% Note! The red ogive is an
increasing function (never
0 0% decreasing) and never goes
higher than 100 %

Cumulative rel. freq. – Step function
● The increase at each step is equal to the height of the
corresponding stack in the histogram
40,0% 120,0%
35,0% height = A
100,0%
30,0%
80,0%
25,0%
increase = B
20,0% height = B 60,0%
15,0%
40,0%
increase = A
10,0%
20,0%
5,0%
0,0% 0,0%
0-20 21 – 40 41 – 60 61 – 80 81 – 100 101 – 120 121 – 140 > 140 0 20 40 60 80 100 120 140 160
Increases (never decreases)

and never goes above 100 %

Quick summary:
What type of diagram do we use to show a frequency distribution?
● Categorical, nominal Bar chart, Pareto-ordered; Pie chart
● Categorical, ordinal Bar chart, ordered by rank (lowest - highest)
● Numerical, discrete, few values Bar chart, one bar for each discrete value
● Numerical, discrete, many values Histogram, divided into classes

(approximates continuous)
● Numerical, continuous Histogram , divided into classes
● Visualizing a cumulative frequency distribution?

Ogive or Step function (discrete and continuous)

Skip this!
Stem-and-Leaf Displays
● sv. stambladdiagram
● Provides exact values and visualizes the 8 8
distribution 7 3
● In this example 6 3
– stems = tens (10, 20, …) 8 5 2

– leafs = ones (0, 1, 2,…, 9) 6 5 2
5 4 1
● Not very common these days, before the
6 3 3 0 1
era of graphical printing
1 2 3 4 5

NCT 16.1
Time series, observations over time

● Line chart, time series plot
– visualizes changes over time
– time on the x-axis, observed values on the y-axis
– points are connected
4 Price
3,5 Year Milk Soda pop
1975 1,34 1,20
3
1976 1,43 1,33
2,5 1977 1,63 1,38
2 Mjölk 1978 1,92 1,46
Sockerdricka
1979 2,07 1,51
1,5 1980 2,41 1,66
1 1981 2,87 1,84
1982 3,38 2,01
0,5
0
1975 1976 1977 1978 1979 1980 1981 1982

Numerical measures - summaries
● Define numerical measures that summarize the most important
properties of a set of observations
● Location – where are the observations?

– Around 4, 25, 100 or 10 000?
– Measures of location, central tendency
● Dispersion – how spread out are the observations from the

central location?
– about 2-8 or 4-500? Or in the interval 100±20?
– Many close to the center? Or in the “tails” i.e. endpoints?
– Measures of variability

The data and notation
● Denote the variable by 𝒙𝒙 (or some other letter 𝑦𝑦, 𝑧𝑧, 𝑢𝑢, 𝑣𝑣, …)
● 𝒏𝒏 observations, sample size (𝑵𝑵 population size)
● Denote by indexing with 𝒊𝒊 = 𝟏𝟏, 𝟐𝟐, 𝟑𝟑, … , 𝒏𝒏 − 𝟏𝟏, 𝒏𝒏 (labels)
● Value of the 𝒊𝒊th observation of the variable 𝒙𝒙 is denoted by 𝒙𝒙𝒊𝒊
● The entire set of observed values may be denoted as
{𝒙𝒙𝟏𝟏 , 𝒙𝒙𝟐𝟐 , 𝒙𝒙𝟑𝟑 , … , 𝒙𝒙𝒏𝒏−𝟏𝟏 , 𝒙𝒙𝒏𝒏 }

Measure of Location: Mean
● Arithmetic sample mean (sv. medelvärde)
𝑛𝑛
● Sum all and divide by the number 1
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖
𝑛𝑛
● 𝒙𝒙-bar 𝑖𝑖=1
● Mean value is sensitive of extreme values:
ex. {2, 3, 4, 5} ⇒ 𝑥𝑥̅ = 3,5

ex. {2, 3, 4, 25} ⇒ 𝑥𝑥̅ = 8,5
ex. {22, 23, 24, 25} ⇒ 𝑥𝑥̅ = 23,5
𝑁𝑁
1
● Population mean often denoted 𝝁𝝁 or 𝝁𝝁𝒙𝒙 𝜇𝜇 = � 𝑥𝑥𝑖𝑖
𝑁𝑁
𝑖𝑖=1
(”mu”, or. Sv. ”my”)

More on means – grouped values
● 𝒏𝒏 observations, distributed such that we have 𝒏𝒏𝒌𝒌 of them sharing
the same value 𝒙𝒙 = 𝒌𝒌
● E.g. 𝒏𝒏𝟎𝟎 zeroes (𝒙𝒙 = 𝟎𝟎), 𝒏𝒏𝟏𝟏 ones (𝒙𝒙 = 𝟏𝟏), 𝒏𝒏𝟐𝟐 twos (𝒙𝒙 = 𝟐𝟐), … etc. up
to 𝒏𝒏𝑪𝑪 with value 𝒙𝒙 = 𝑪𝑪
Proportion
𝑛𝑛 𝐶𝐶 𝐶𝐶
1 1 𝑛𝑛𝑘𝑘 with value k
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖 = � 𝑛𝑛𝑘𝑘 ∙ 𝑘𝑘 = � ∙ 𝑘𝑘
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑘𝑘=0 𝑘𝑘=0
● Ex. 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3 yields
4
𝑛𝑛𝑘𝑘 1 20 5
𝑥𝑥̅ = � 𝑘𝑘 = 3∙0+2∙1+3∙2+4∙3 = = = 1,6667
𝑛𝑛 12 12 3
𝑘𝑘=0

Not required but
- useful to know
Geometric mean
● Monthly interest on an investment over 6 months:
10%, 7%, -2%, 11%, 8%, 4%
● Total growth over the entire period is:

1,10 · 1,07 · 0,98 · 1,11 · 1,08 · 1,04 = 1,4381
i.e. +43,81%
● Geometric mean: 𝑥𝑥̅𝑔𝑔 = 𝑛𝑛

𝑥𝑥1 𝑥𝑥2 ⋯ 𝑥𝑥3
6
● Here 𝑥𝑥̅𝑔𝑔 = 1,4381 = 1,0624
i.e. average interest per month = 6,24%

Location, central tendency: Median
● The median separates a numerical dataset in half
● 50% of the observations lie on either side of the median
● Arrange the observations in increasing order, smallest-largest

n even ⇒ median = mean of the two in the middle
n odd ⇒ median = the middle value
ex. {2, 3, 4, 5} ⇒ median = 3,5

Less sensitive to
ex. {2, 3, 4, 25} ⇒ median = 3,5 extreme values
ex. {2, 3, 4, 25, 135} ⇒ median = 4

Location: Mode
● Sv. typvärde
● The most frequently occurring value; largest frequency
ex. {4, 2, 3, 3, 5, 1, 3, 5} ⇒ Mode = 3 Unimodal
ex. {5, 2, 3, 3, 5, 1, 3, 5} ⇒ Mode = 3 and 5

Bimodal
● Useful for categorical variables (nominal and ordinal scales)
ex. {b, a, c, b, d, b, a, e} ⇒ Mode = b

Variability: Range
● The size of the observed range of values, the interval within

which all observations lie
● Difference between the largest and smallest values
Range = Max – Min
● Sensitive of extreme values

Variability : Quartiles – Quartile Range
Arrange the observations in increasing order:
● Q1 = 1st quartile
25% of observations below, 75% above
● Q3 = 3rd quartile
75 % of observations below, 25 % above
● IQR = Inter Quartile Range = Q3 – Q1

(sv. kvartilavstånd)
● 50 % of the observations lie in an interval that is IQR wide

See NCT p. 64-67
Formulary p. 2
Percentiles
Let 𝒙𝒙(𝟏𝟏) , 𝒙𝒙(𝟐𝟐) , … , 𝒙𝒙(𝒏𝒏) denote the ordered sample, ordered by size from
the smallest value 𝒙𝒙(𝟏𝟏) to the largest 𝒙𝒙(𝒏𝒏)
𝑝𝑝
● Let 𝒂𝒂 = integer part of (𝑛𝑛 + 1)
100
𝑝𝑝
● Låt 𝒃𝒃 = decimal part of (𝑛𝑛 + 1)
100
● 𝒑𝒑th percentile = 𝑥𝑥(𝒂𝒂) + 𝒃𝒃 ∙ (𝑥𝑥 𝒂𝒂+1 − 𝑥𝑥 𝒂𝒂 )
Ex. {11, 12, 14, 15, 17, 18, 20, 21, 21, 23, 30, 40}, 𝒏𝒏 = 𝟏𝟏𝟏𝟏
40
40th percentile: 𝑛𝑛 + 1 = 12 + 1 ∙ 0,4 = 𝟓𝟓, 𝟐𝟐 ⇒ 𝑎𝑎 = 𝟓𝟓 och 𝑏𝑏 = 𝟎𝟎, 𝟐𝟐
100
40th percentile = 𝑥𝑥(𝟓𝟓) + 𝟎𝟎, 𝟐𝟐 ∙ 𝑥𝑥 𝟓𝟓+1 − 𝑥𝑥 𝟓𝟓 = 17 + 0,2 ∙ 18 − 17 = 𝟏𝟏𝟏𝟏, 𝟐𝟐

Median - example
● n = 12 observations ordered by size:

{11, 12, 14, 15, 17, 18, 20, 21, 21, 23, 30 och 40}
● Start with (n+1) = 13 0,5 because it’s half
● (n+1)·0,5 = 6,5 i.e. in between the 6th and 7th
● median = md = P50 = (18+20)/2 = 19
= mean value of the 6th and the 7th observations

Quartiles - example
● 1st quartile 25th percentile Δ = 15-14 = 1
(n+1)·0,25 = 3,25 a = 3, b = 0,25
I↓ I
Q1 = 14 + 0,25·1 = 14,25 14 15
between 3rd and 4th
● 3:e quartile 75th percentile
(n+1)·0,75 = 9,75 Δ = 23-21 = 2
Q3 = 21 + 0,75·2 = 22,50
a = 9, b = 0,75
I I ↓ I
21 23
● IQR = Q3 – Q1 = 8,25 between 9th and 10th

With Excel
● Same data used in the example are in the cells A1–A12
● Write the following functions in any empty cell:
=MIN(A1:A12)
=QUARTILE.EXC (A1:A12;1)
=MEDIAN(A1:A12)
=QUARTILE.EXC (A1:A12;3)
=MAX(A1:A12)
English and Swedish versions of Excel functions, see e.g.

http://www.exceldepartment.com/excelkurs/extramaterial/excelfunktioner-svenska-engelska/

Box-and Whisker plots – visual summary
● We need:
– smallest and largeset values - min and max
– median, 1st and 3rd quartiles - Md, Q1 and Q3
”Five-number summary” NCT p. 65
Definition of extreme values (according to Tukey):

● Outliers: values that lie more than 1,5⨯IQR below Q1 or above
Q3
● Extreme outliers: 3⨯IQR

Box plots, cont.
”Five-number summary”
Median Max
Extreme
Min Q1 Q3 values
20 30 40 50 60 70 80
1,5 × IQR IQR 1,5 × IQR

NCT p. 73-74
Variability: Variance
● Average squared distance to the mean
● Sample and population variances:

𝑛𝑛 𝑁𝑁
2
1 1
𝑠𝑠𝑥𝑥 = �(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )2 2
𝜎𝜎𝑥𝑥 = �(𝑥𝑥𝑖𝑖 −𝜇𝜇)2 (”sigma”)
𝑛𝑛 − 1 𝑁𝑁
𝑖𝑖=1 𝑖𝑖=1
– Note! For samples, divide by 𝒏𝒏 − 𝟏𝟏 rather than 𝒏𝒏
– Unit of measurement is transformed to square units
● Standard deviation:
– Restores unit of measurement 𝑠𝑠𝑥𝑥 = 𝑠𝑠𝑥𝑥2 𝜎𝜎𝑥𝑥 = 𝜎𝜎𝑥𝑥2
– sv. standardavvikelse
● Coefficient of Variation: read on you own in NCT p. 75

Variance – alternative formulas
● Sample variance
∑𝑛𝑛 2 ∑𝑛𝑛 2 2 Shortcut formula

𝑖𝑖=1 (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) 𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑛𝑛𝑥𝑥̅
𝑠𝑠𝑥𝑥2 = = sometimes easier to use
𝑛𝑛 − 1 𝑛𝑛 − 1
Excel: ’=VAR.S(…)’
● Population variance
∑𝑁𝑁 2 ∑𝑁𝑁 2 2
𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝜇𝜇) 𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑁𝑁𝜇𝜇
𝜎𝜎𝑥𝑥2 = =
𝑁𝑁 𝑁𝑁
Excel: ’=VAR.P(…)’

Variance
● Four observations {2, 3, 5, 8}; mean 𝑥𝑥̅ = 4,5
● Distance to mean 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ , square them and sum:
4
𝑥𝑥𝑖𝑖 2 3 5 8 18 3,5
3
𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ -2,5 -1,5 0,5 3,5 0 2,5
2
2 21
𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 6,25 2,25 0,25 12,25 1,5
2,5
1
𝑥𝑥𝑖𝑖2 4 9 25 64 102 0,5
0
0 1 2 3 4 5 6 7 8 9
● Calculate the variance:

2,5
– divide by 𝑛𝑛 − 1 = 3 or 𝑁𝑁 = 4?
∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2 𝟐𝟐𝟐𝟐 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 − 𝑛𝑛𝑥𝑥̅ 2 𝟏𝟏𝟏𝟏𝟏𝟏 − 4 ∙ 4,52 21
𝑠𝑠𝑥𝑥2 = = =7 𝑠𝑠𝑥𝑥2 = = = =7
𝑛𝑛 − 1 3 𝑛𝑛 − 1 4−1 3

Quick quiz:
Which of the following statements about variances are true?
1. Variances are always positive False, can be zero
2. Variances can never be negative True
3. Variances can never equal zero False
4. If you add 10 to every observation in the data, the variance

will be unaffected True
5. If you multiply every observation with 10, the variance will be

100 time larger True

NCT p. 75-76
Chebyshev’s theorem and the Empirical rule

● Provides a description of how spread out our observations that
relates to the standard deviation (variance):
Rule μ±σ μ ± 2σ μ ± 3σ
Chebyshev: 0% 75 % 88,89 % Guaranteed
Under some
Empirical: ca 68 % ca 95 % ca 100 % conditions
”bellshaped”
● Compare to Q1, Q3 and IQR

95%
μ ± 2σ

NCT p. 91
Skewness – sv. snedhet - useful knowledge
● If the distribution looks as if it has been “pulled out” to one

side, we say the distribution is skewed (sv. sned)
● Symmetric if it equally distributed on both sides (non-skewed)
Left skewed Symmetric Right skewed

Medel ≠ Median ≠ Typ Medel = Median = Typ Typ ≠ Median ≠ Medel

Variable type & descriptive measures
Variables
Categorical Numerical
Nominal- and ordinal scale

Location: mode (typvärde)
Discrete Continuous
Variability: no. of categories/levels
Interval and ratio scale

Location: mode, median (Q1 & Q3), mean
Variability: range, IQR, variance & standard deviation

Next time …
… we’ll continue with descriptive statistics and discuss how to
describe and study two variables:
● Tables and graphs etc.
Especially the relationship between two numerical variables

● Graphically
– scatter plots
● Measures of Relationships between two variables:
– covariance and correlation coefficient

Exercise: DIY
𝑖𝑖 1 2 3 4 5 6 7 8 9 10 ∑𝑖𝑖
𝑥𝑥𝑖𝑖 5 2 3 6 5 2 5 3 5 4 40
1 1 40
Mean: 𝑥𝑥̅ = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 = 5 + 2 + ⋯+ 4 = = 4,0
𝑛𝑛 10 10
𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 1 −2 −1 2 1 −2 1 −1 1 0 0
𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 2 1 4 1 4 1 4 1 1 1 0 18
1 1 18
Variance: 𝑠𝑠𝑥𝑥2 = ∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )2 = 1 + 4 + 1 + ⋯+ 0 = = 2,0
𝑛𝑛−1 9 9
Standard deviation: 𝑠𝑠𝑥𝑥 = 𝑠𝑠𝑥𝑥2 = 2,0 = 1,4142 …

Exercise: DIY, cont.
𝑖𝑖 1 2 3 4 5 6 7 8 9 10 ∑𝑖𝑖
𝑥𝑥𝑖𝑖 5 2 3 6 5 2 5 3 5 4 40
1 1 40
Mean: 𝑥𝑥̅ = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 = 2 + 2 + ⋯+ 6 = = 4,0
𝑛𝑛 10 10
𝑥𝑥𝑖𝑖2 25 4 9 36 25 4 25 9 25 16 178
alt. formula:
1 1 178−160
Variance: 𝑠𝑠𝑥𝑥2 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 − 𝑛𝑛𝑥𝑥̅ 2 = 178 − 10 ∙ 42 = = 2,0
𝑛𝑛−1 9 9
Mode = 5 Range = 𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑀𝑀𝑀𝑀𝑀𝑀 = 6 − 2 = 4

𝑛𝑛 = 10 ⇒ 𝑛𝑛 + 1 = 11
Exercise: DIY, cont.
(𝑖𝑖) 1 2 3 4 5 6 7 8 9 10 ∑𝑖𝑖
𝑥𝑥(𝑖𝑖) 2 2 3 3 4 5 5 5 5 6 40
Median: 50% of 𝒏𝒏 + 𝟏𝟏 = 5,5 ⇒ 𝑎𝑎 = 𝟓𝟓 𝑏𝑏 = 𝟎𝟎, 𝟓𝟓
𝑥𝑥(𝟓𝟓) + 𝟎𝟎, 𝟓𝟓 ∙ 𝑥𝑥 𝟔𝟔 − 𝑥𝑥 𝟓𝟓 = 4 + 𝟎𝟎, 𝟓𝟓 5 − 4 = 4,5
Q1: 25% of 𝒏𝒏 + 𝟏𝟏 = 2,75 ⇒ 𝑎𝑎 = 𝟐𝟐 𝑏𝑏 = 𝟎𝟎, 𝟕𝟕𝟕𝟕
𝑥𝑥(𝟐𝟐) + 𝟎𝟎, 𝟕𝟕𝟕𝟕 ∙ 𝑥𝑥 𝟑𝟑 − 𝑥𝑥 𝟐𝟐 = 2 + 𝟎𝟎, 𝟕𝟕𝟕𝟕 3 − 2 = 2,75
Q3: 75% of 𝒏𝒏 + 𝟏𝟏 = 8,25 ⇒ 𝑎𝑎 = 𝟖𝟖 𝑏𝑏 = 𝟎𝟎, 𝟐𝟐𝟐𝟐
𝑥𝑥(𝟖𝟖) + 𝟎𝟎, 𝟐𝟐𝟐𝟐 ∙ 𝑥𝑥 𝟗𝟗 − 𝑥𝑥 𝟖𝟖 = 5 + 𝟎𝟎, 𝟐𝟐𝟐𝟐 5 − 5 = 5

Basic Statistics For Economists Lecture 2

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Basic Statistics For Economists Lecture 2

Încărcat de

Drepturi de autor:

Formate disponibile

Basic statistics for

20 March 2018 Michael Carlson, Dep. of Statistics 2

20 March 2018 Michael Carlson, Dep. of Statistics 3

Categories but no natural Nominal

20 March 2018 Michael Carlson, Dep. of Statistics 4

𝒏𝒏𝒌𝒌 = no. of objects with property 𝒌𝒌

● If there are in total 𝒏𝒏 objects and in total 𝑪𝑪 different categories

20 March 2018 Michael Carlson, Dep. of Statistics 5

Count the number that fall into each defined category:

Flavor Frequency Relative frequency %

Note! Frequencies are always numerical but

20 March 2018 Michael Carlson, Dep. of Statistics 6

No. of points 0 1 2 3 4 Sum

Arrange in order of numerical magnitude!

20 March 2018 Michael Carlson, Dep. of Statistics 7

● Separation into classes, categorization (sv. klassindelning)

● Classes and class widths must be defined

● Same class width or varying width? What does NCT say?

● Summarize in a table - ordered by magnitude of values of the

20 March 2018 Michael Carlson, Dep. of Statistics 8

Income per Relative Cumulative Cumulative

61 – 80 40 20,0 % 174 87,0 %

20 March 2018 Michael Carlson, Dep. of Statistics 9

20 March 2018 Michael Carlson, Dep. of Statistics 10

Bar charts Pie charts

Nominal scale Nominal scale

Ordinal scale Start at 12 o’clock and

20 March 2018 Michael Carlson, Dep. of Statistics 11

Bar chart (sv. stolp- el. stapeldiagram)

Ratio scale (also interval scale)

20 March 2018 Michael Carlson, Dep. of Statistics 12

● Class widths are defined (bins)

● Inclusive and non-overlapping – each observation belongs to

● The frequency of a class is represented by area of the bar not

20 March 2018 Michael Carlson, Dep. of Statistics 13

Equal class widths Unequal class widths

Height is proportional to the area By Qwfp at English Wikipedia, CC BY-SA 3.0,

20 March 2018 Michael Carlson, Dep. of Statistics 14

20 March 2018 Michael Carlson, Dep. of Statistics 15

Increases (never decreases)

20 March 2018 Michael Carlson, Dep. of Statistics 16

● Categorical, nominal Bar chart, Pareto-ordered; Pie chart

● Categorical, ordinal Bar chart, ordered by rank (lowest - highest)

● Numerical, discrete, many values Histogram, divided into classes

● Visualizing a cumulative frequency distribution?

20 March 2018 Michael Carlson, Dep. of Statistics 17

– stems = tens (10, 20, …) 8 5 2

20 March 2018 Michael Carlson, Dep. of Statistics 18

Time series, observations over time

20 March 2018 Michael Carlson, Dep. of Statistics 19

● Location – where are the observations?

● Dispersion – how spread out are the observations from the

20 March 2018 Michael Carlson, Dep. of Statistics 20

● 𝒏𝒏 observations, sample size (𝑵𝑵 population size)

● Denote by indexing with 𝒊𝒊 = 𝟏𝟏, 𝟐𝟐, 𝟑𝟑, … , 𝒏𝒏 − 𝟏𝟏, 𝒏𝒏 (labels)

● Value of the 𝒊𝒊th observation of the variable 𝒙𝒙 is denoted by 𝒙𝒙𝒊𝒊

● The entire set of observed values may be denoted as

{𝒙𝒙𝟏𝟏 , 𝒙𝒙𝟐𝟐 , 𝒙𝒙𝟑𝟑 , … , 𝒙𝒙𝒏𝒏−𝟏𝟏 , 𝒙𝒙𝒏𝒏 }

20 March 2018 Michael Carlson, Dep. of Statistics 21

● Mean value is sensitive of extreme values:

ex. {2, 3, 4, 5} ⇒ 𝑥𝑥̅ = 3,5

20 March 2018 Michael Carlson, Dep. of Statistics 22

20 March 2018 Michael Carlson, Dep. of Statistics 23