Basic Tools

Basic Quantitative Tools (Prof.
Campbell)
Data needed
last updated:
Friday, March 17, 2008
(INFERENTIAL STATISTICS -- inferring from a sample to a population using the laws of probability)
How confident can one be that the sample mean (or proportion) represents the population as a whole?
confidence interval (mean)
one interval variable
inverse: given a specific confidence interval, what is the needed sample size?
confidence interval (proportion)
one nominal variable
Do differences found in a sample (a subset of the population) reflect differences in the
population as a whole? (commonly used to generalize from survey results)
two categorical variables
an interval variable divided
into two categories
a nominal variable divided
into two categories
chi-square
difference of means
difference of proportions
an interval variable divided

into three or more categories
ANOVA (Analysis of Variance)

What is the relationship between two variables?
correlation analysis (including an example of
ecological fallacy)
two interval variables
How many total jobs are dependent on basic (export-based) jobs?

number of basic (export)
jobs, number of total jobs
Multiplier
(export + locally serving)
What is the relative concentration of local employment by sector?
employment (total and by
sector) for both the locality
Location Quotients
and the nation
How can we estimate interaction (e.g., trade, traffic) between two cities?
population of two cities,
Gravity Model
distance, constant
How do we measure growth over time?
Growth Rates (3 types)
population levels over time
How do we compare costs and benefits (e.g., of a project) over time?

quantified costs and benefits
for each year, discount rate
Cost-benefit analysis
251437191.xls.ms_office
Overview
11/17/2014 6:52 AM
calculate a confidence interval (with interval data)
that is, how confident are you that your sample estimate comes close to the populat
enter data
in yellow cells
Data needed:
sample mean (X)
std dev of sample (s)
sample size (n)
value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate
X t.025
Data Hhd Income

1
2
24,000
36,000
24,000
36,000
12,000
12,000
4
5
74,000
46,000
74,000
46,000
27,000
27,000
23,000
23,000
8
9
10
11
12
13
14
15
16
69,000
107,000
53,000
29,000
34,000
43,000
28,000
24,000
43,000
69,000
107,000
53,000
29,000
34,000
43,000
28,000
24,000
43,000
MEAN
STDEV
n
t
42,000
24,105
16
2.131
SO:
u=
42,000 +/-
lower end of confidence interval

upper end of confidence interval
range
Confidence Interval
set the confidence level (2-tail)
0.05
20,000
40,000
60,000
close to the population mean?
X t.025
s
n
12,845
29,155
54,845
25,690
80,000
100,000
calculate a confidence interval
Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)
Data needed:
sample mean (X)
sample size (n)
enter data
in yellow cells
MEAN 42,000
STDEV 5,000
n
384
t
SO:
u=
1.966
42,000

range
0.05
Confidence Interval
20,000
40,000
e comes close to the population mean?
data (mean, std dev., n)
X t.025
+/-
s
n
502
f confidence interval 41,498

of confidence interval 42,502
1,003
Confidence Interval
40,000
60,000
80,000
100,000
calculate a minimum sample size need to achieve a specific confi
Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)
Data needed:
sample mean (X)
sample size (n)
MEAN
42,000
STDEV
5,000
c (confidence interval range) 500
t
enter data
in yellow cells
SO:
u=
1.960
lower end of co
upper end of co
range
0.05
given values of stdev and c and confidence level, we calculate "n":
sample size needed
384
NOTES:
1. For the value of "t", we simply
assumed a large sample size (t --> Z),
e.g., for 95% confidence interval (2tailed), t = 1.96.
2. We are also assuming a large
population size (M), so that N/M --> 0.
20,000
a specific confidence interval range
close to the population mean?
here is the formula to calculate a confidence
X t.025
s
n
solving for n (sample size)
42,000 +/-
500
ower end of confidence interval 41,500

upper end of confidence interval 42,500
1,000
t.025 s
n
c
Confidence Interval
leads to this equation (so, to estima

size, you need to know Stdev, the con
and the value of t.
t.025 s 2
n(
)
c
20,000
40,000
60,000
80,000
100,000
calculate a confidence interval
t.025
s
n
mple size)
t.025 s
equation (so, to estimate sample

to know Stdev, the confidence interval,
t.025 s 2
(
)
c
calculate a confidence interval using proportions (nominal data)

for large n
one nominal variable (proportions)
the population proportion is
Data needed:
sample proportion (P)
enter data
in yellow cells
P 1.96
sample size (n)

0.05
P
n
50%
100
SO:
p
0.500 +/-

range
1.984
Confidence Interval
0%
10%
20%
30%
40%
50%
60%
70%
(nominal data)
P(1 P)
P 1.96
n
in percent
70%
80%
0.099
9.9%
0.401
0.599
0.198
40.1%
59.9%
19.8%
90%
100%
Chi-Square
does the distribution of ou

from a random distribution
CHI-SQUARE TEST (EXCEL: FUNCTION)

ACTUAL (OBSERVED)
city suburb
strong
2
1
medium
1
2
weak
1
1
4
4
rural
1
1
2
4
4
4
4
12
enter data
in yellow cells
PREDICTED/EXPECTED (based on mutiplying row and column to
strong
medium
weak
city suburb
1.3333 1.3333
1.3333 1.3333
1.3333 1.3333
4
4
rural
1.3333
1.3333
1.3333
4
4
4
4
12
Chi-square test (Calculated by Excel): "CHITEST"
###
(probability of this sample outcome if no difference in population)
fo
range: 0 to 1
Difference between predicted and actual
strong
medium
weak
city suburb
rural
-0.6667 0.3333 0.3333
0.3333 -0.6667 0.3333
0.3333 0.3333 -0.6667
0
0
0
Page 11
fe
fe
0
0
0
0
Chi-Square
distribution of outcomes (observed) significantly differ

andom distribution (expected)?
nter data 20
yellow cells
16
12
8
12
16
20
12
4
and column totals)
( fo fe )

fe
2
fo
observed frequencies
fe
expected frequencies
fe
Page 12
The t distribution is used for hypothesis testing with small samples (e.g., smaller than about 100 cases)
the t distribution is similar to the z distribution, but is "flatter" because of the smaller sample size.
When the sample size gets large (e.g., over 50-100), the t distribution approaches that of the Z distribution (a normal c
d.f.
tails
5
2
1.000
probability of this outcome if no difference in population
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
1.000
0.950
0.924
10
2
50
1000
Probabilities
and t-scores for various degree
2
2
test)
1.000
1.000
1.000
0.922
0.921
0.920
0.849
0.845
0.842
0.842
0.900
0.776
0.770
0.765
0.764
0.706
0.698
0.691
0.689
0.850
0.638
0.628
0.619
0.617
0.800
0.575
0.562
0.551
0.549
0.515
0.500
0.487
0.484
0.750
0.460
0.442
0.427
0.424
0.409
0.389
0.372
0.368
0.700
0.363
0.341
0.322
0.318
0.321
0.297
0.277
0.272
0.650
0.284
0.258
0.236
0.230
0.600
0.250
0.223
0.200
0.194
0.220
0.192
0.168
0.162
0.550
0.194
0.165
0.140
0.134
0.170
0.141
0.116
0.110
0.500
0.150
0.120
0.095
0.089
0.132
0.102
0.078
0.072
0.450
0.116
0.087
0.063
0.058
0.400
0.102
0.073
0.051
0.046
0.090
0.062
0.041
0.036
0.350
0.079
0.052
0.032
0.028
0.070
0.044
0.026
0.022
0.300
0.062
0.037
0.020
0.017
0.054
0.031
0.016
0.013
0.250
0.048
0.026
0.012
0.009
0.043
0.022
0.009
0.007
0.200
0.038 the 0.050.019
0.007
level is by convention used0.005
as the threshold of
0.150
0.034 statistical
0.016
0.004 we use an even more
significance0.006
(though sometimes
0.030 strict level,
0.013
0.004
0.003
such as 0.01
or even 0.001
0.100
0.050
0.000
0.5
1.5
standardized sample differences (t
bout 100 cases)

the Z distribution (a normal curve)
r various degrees of freedom (two-tail

test)
Degrees of freedom
5
10
50
1000
the larger the sample size, the lower the

value of the critical t ...
when the sample size gets large (e.g.,
over 50 - 100), then the critical t level (.05,
2 tail) approaches 1.96
2
differences (t-scores)
2.5
difference of means
Small Standard Deviation
Larger Standard Deviation
Factor
Case
1
2
3
4
5
6
7
8
9
10
11
12
50
Male Income
69,000
77,000
46,000
59,000
55,000
50,000
38,000
63,000
50,000
56,000
74,000
50,000
45
Female Income
49,000
67,000
69,000
64,000
30,000
68,000
73,000
61,000
61,000
48,000
72,000
57,000
Mean
Std Dev.
57,250
11,702
59,917
12,428
female
Factor
80
75
CaseMale Income
Female Income
1
40,000
72,000
2
42,000
34,000
3
83,000
65,000
4
100,000
34,000
5
100,000
86,000
6
86,000
86,000
7
104,000
67,000
8
70,000
64,000
9
37,000
79,000
10
62,000
78,000
11
88,000
85,000
12
72,000
83,000
Mean
Std Dev.
73,667
24,092
69,417
18,372
female
mean
mean
Male
mean
mean
Male
20,000
40,000
60,000
80,000
100,000
20,000
40,000
60,000
80,000
100,000
t-Test: Two-Sample Assuming Equal Variances
Male Income
Mean
57250
Variance
136931818.2
Observations
12
Pooled Variance
145689393.9
Hypothesized Mean Difference 0
df
22
t Stat
-0.541
P(T<=t) one-tail
0.297
t Critical one-tail
1.717
P(T<=t) two-tail
0.594
t Critical two-tail
2.074
Male IncomeFemale Income

Mean
73666.6667 69416.6667
Variance
580424242 337537879
Observations
12
12
Pooled Variance458981061
Hypothesized Mean Difference
0
df
22
t Stat
0.486
P(T<=t) one-tail
0.316
t Critical one-tail
1.717
P(T<=t) two-tail
0.632
t Critical two-tail
2.074
Female Income
59916.66667
154446969.7
12
fail to
reject
Page 15
fail to
reject H
difference of means
Data Needed:
number of cases for each of the two groups
sample means for the two groups
standard deviation for each group
Hypothesis (no difference between the two population means):
1 2
i.e., 1 2 0
How to calculate t (note; EXCEL will do this all for you -- so do don't need to really use this formula)
(X1 X2 ) ( 1 2 )
t
_ _
X 1 X 2
SINCE WE HYPOTHESIZE U1=U2, OR U1 -U2 = 0, then the (u1-u2) drops out of the numerator of the equation for t
(X 1 X 2 )
t
_ _
X1 X 2
the formula for the standard error (the denominator of the equation for t)
X1X2
12 22
N1 N2
Page 16
difference of means
if we can assume the same standard deviation of the populations ("equal variance")
N1 N 2
X1X 2
N1 N 2
Page 17
difference of means
Bigger DOM
Factor
80
70
CaseMale Income
Female Income
1
37,000
55,000
2
105,000
40,000
3
52,000
82,000
4
58,000
97,000
5
107,000
56,000
6
105,000
44,000
7
96,000
39,000
8
86,000
70,000
9
82,000
77,000
10
104,000
79,000
11
71,000
44,000
12
100,000
47,000
Mean
Std Dev.
83,583
23,922
60,833
19,441
female
mean
Male
mean
100,000
120,000
20,000
40,000
60,000
80,000
100,000
g Equal Variances
emale Income
Male IncomeFemale Income

Mean
83583.3333 60833.3333
Variance
572265152 377969697
Observations
12
12
Pooled Variance475117424
Hypothesized Mean Difference
0
df
22
t Stat
2.557
P(T<=t) one-tail
0.009
t Critical one-tail
1.717
P(T<=t) two-tail
0.018
t Critical two-tail
2.074
fail to
reject H0
reject H0
Page 18
120,000
difference of means
r of the equation for t
Page 19
Diff of Proportions
A special case of the difference of means test

Do you Own a Car? 1= yes, 0=no
Percent of Residents Who Own a Car
Case
City ResidentsSuburban Residents

1
0
0
2
0
0
3
0
1
4
0
1
5
0
1
6
0
1
7
0
1
8
1
1
9
1
1
10
1
1
11
1
1
12
1
1
13
1
1
14
1
1
15
1
1
16
1
1
17
1
1
18
1
1
19
1
1
20
1
1
21
1
1
22
1
1
Mean
68.2%
90.9%
n of cases
22
22
degrees of freedom (n1 +n2-2)
100.0%
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
City Residents
42
for simplicity and conservatism, we could have

also assumed that the population proportions are 50% and 50%
t-score
Numerator:
pu
sqrt(pu,qu)
denominator=
-22.7%
0.79545
0.40337
0.12162
t-score
-1.86871
Prob-t
0.068649
see Blalock, p. 234
The central question: does the differen

an actual difference among the entire p
hypothesis]
Alternative: the difference is due mere

variation, and that there is no difference
[the null hypothesis]
Remember: generally if |t| >2 (i.e., if t < -2 or t >

2), then it is "statistically significant" at the .05 level.
That is, there is less than a 5% chance that one could get
this difference in the sample drawn from a population
where there is no difference between city and suburban
Page 20
Diff of Proportions
cent of Residents Who Own a Car
Suburban Residents
on: does the difference found in the sample reflect

ce among the entire population? [the research
difference is due merely to random sample

there is no difference in the population as a whole.
2 or t >
he .05 level.
at one could get
a population
and suburban
Page 21
Diff of Proportions (2)
Here, if given just the mean, n of cases
Mean
10.0% 20.0%
n of cases 150
120
degrees of freedom (n1 +n2-2) 268
t-score
Numerator:-10.0%
pu
0.14444
sqrt(pu,qu)0.35154
denominator=
0.04305
t-score
-2.3226
Prob-t
0.0209
see Blalock, p. 234
NOte that as the mean values deviate from 50%, we can

be more accurate:
e.g., compare 10% to 20%, vs. 40% to 50%
or 80% 90%
Page 22
ANOVA
AUTO MILES DRIVEN PER WEEK

Case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Mean

Rural Residents
20
50
40
0
80
50
50
90
60
100
350
70
70
240
80
35
120
90
12
90
100
150
80
100
120
70
20
0
60
30
18
90
40
35
111
50
42
122
60
67
133
250
95
144
170
66
155
120
77
96
150
123
23
170
0
65
180
18
24
111
24
17
130
75
85
75
54.4
104.3
97.5

0
17
mean
rural 0
23
12
24
18
50
18
60
20
65
24
70
35
80
35
80
42
85
50
90
mean
66
90
67
90
suburban
70
96
75
111
77
120
95
122
100
133
120
144
123
155
150
240
urban
200
350
54.4mean 104.3
1
0
50
100
AUTO MILES DRIVEN PER W
Anova: Single Factor

SUMMARY
Groups
Count
City Residents
22
Suburban Residents
22
Rural Residents
22
ANOVA
Source of Variation SS
Between Groups21054.6364
Within Groups 242243.727
Total
263298.364
Sum
1397
2295
2146
df
2
63
Average
63.5
104.318182
97.5454545
Variance
2672.64286
5471.65584
3391.11688
MS
10527.3182
3845.13853
F
2.73782547
P-value
0.07241657
F crit
3.14280868
65
Why use ANOVA?

In situations where you are comparing the means from more than two groups.
since in a difference of means test, you compare x2-x1.
For more than two groups, you can't compare x3-x2-x1.
so you look at the variation (sum of squares) within vs. between groups.
Intuitively, sample groups with low internal variation, but high variation across groups, will
likely represent real differences in the population as a whole.
Page 23
ANOVA
While sample groups with high internal variation and low variation across groups have
a greater chance of representing populations with no real differences.
Anova: Single Factor
SUMMARY
Groups
Count
City Residents
22
Suburban Residents
22
Rural Residents
22
ANOVA
Source of Variation SS
Between Groups32248.5758
Within Groups 225825.545
Total
258074.121
Sum
1197
2295
2146
df
2
63
Average
54.4090909
104.318182
97.5454545
Variance
1890.82468
5471.65584
3391.11688
MS
16124.2879
3584.53247
F
4.49829595
65
Page 24
P-value
0.01492455
F crit
3.14280868
ANOVA
Rural Residents
20
30
40
40
50
50
60
60
70
75
80
mean
90
100
100
111
120
130
150
170
170
180
250
97.5
150
200
level
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
250
300
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
350
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
400
AUTO MILES DRIVEN PER WEEK
SS
be twe en d . f .
F SSw ithin d. f .
SSbetween sum of squares between the groups
SSwithin sum of squares within the groups
d.f. = degrees of freedom
Page 25
SSbetween sum of squares between the groups

SSwithin sum of squares within
the groups
ANOVA
d.f. = degrees of freedom
Page 26
Case
x
y
1
0
0
2 0.1
0.1
3 0.2
0.2
4 0.3
0.3
5 0.4
0.3
6 0.5
0.4
7 0.6
0.5
8 0.7
0.6
9 0.8
0.7
10 0.8
0.8
11 0.9
0.9
12
1
0.9
correlation+0.99
Case
x
1 0.2
2 0.3
3 0.4
4 0.4
5 0.5
6 0.5
7 0.5
8 0.5
9 0.6
10 0.6
11 0.7
12 0.8
correlation
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
F
### 0 0 0.1 0.2 0.3
p value
0.00
correlation (range: -1 < r < +1)
0.4
0.5
0.6
0.7
0.8
0.9
F
p value
Case
x
y
1
0
1
2 0.1
0.9
3 0.2
0.8
4 0.3
0.7
5 0.4
0.6
6 0.5
0.5
7 0.6
0.3
8 0.7
0.4
9 0.8
0.2
10 0.8
0.2
11 0.9
0.1
12
1
0
correlation -0.99
F
p value
###
0.00
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Case
x
1
0
2 0.1
3 0.2
4 0.3
5 0.4
6 0.5
7 0.6
8 0.7
9 0.8
10 0.8
11 0.9
12
1
correlation
F
p value
y
0.5
0.4
0.6
0.4
0.8
0.3
0.5
0.6
0.3
0.8
0.2
0.5
-0.09
0.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.79
1
y
0
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
0
+0.07
0.0
0.84
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Case
1
2
3
4
5
6
7
8
9
10
11
12
x
0.8634
0.5109
0.691
0.95
0.2146
0.7882
0.5534
0.5532
0.4124
0.3131
0.1139
0.1786
y
0.3088
0.2311
0.9983
0.1494
0.7479
0.6823
0.4609
0.9613
0.3507
0.9722
0.3687
0.3059
-0.1 correlation (range: -1 < r < +1)

1
0.9
0.8
0.7
0.6
using a random number

generator
F
0.1
p-value 0.745
P-Value: this is the probability
that the x-y relationship found in
the sample cases -- expressed as
an r-value -- is simply due to
random variation, and that if one
looked at the population as a
whole, there would be no
relationship. If p<.05, we
generally conclude that the
relationship is statistically
Hit "recalculate now" to
see a new set of numbers
(note: on a MAC this is "COMMAND ="
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
nge: -1 < r < +1)
0.6
0.7
0.8
0.9
-0.95 # # # # # # # # # # # # # # # # # # 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
92.5641 # # # # # 7 6 4 3 3 2 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 2 3 3 4 6 7 # # # #
2.3E-06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
12
Comparing values of correlation coefficient r (with a range of

to +1) from a sample of size n and the corresponding probabililty
of its outcome if no relationship in the population as a whole
1
0.95
ABOVE the .05 line: relationship
NOT statistically significant at the
0.9
0.05 level
0.85
Probability of this outcome (based on the F-test)
r
F=
sign F
n
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
BELOW the .05 line:

relationship is statistically
significant at the 0.05 level
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-1
-0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1
0.1
Value of r
0.2
0.3
0.4
0.5
0.6
0.7
0.8
11
##
00
a range of -1
probabililty
as a whole
Note: as r gets farther

away from zero, both the
strength of the
relationship and the
statistical significance
increase.
Also: as the sample
size (n) increases, the
statistical significance
increases.
As a result: If you want
to demonstrate a
statistically bivariate
relationship, you will
need either an r value
that is far from zero
and/or a large sample
LOW the .05 line:

ationship is statistically
nificant at the 0.05 level
red line:
critical
value: .05
0.8
0.9
D
D
D
D
D
D
D
D
D
E
E
E
E
E
E
E
20%
52%
46%
38%
24%
27%
26%
50%
30%
11%
10%
14%
10%
11%
41%
22%
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
60,928
49,790
51,994
54,799
59,676
58,654
58,771
50,643
57,515
59,173
59,437
58,202
59,436
59,076
48,778
55,147
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0.1
0.1
68000
68000
68000
68000
68000
68000
68000
68000
68000
63000
63000
63000
63000
63000
63000
63000
annual hhd income
correlation
-0.43
Scatterplot; unit of anal

yes: cities with more transit have
$57,500
mean annual hhd income
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
annual hhd income
case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
percent of
hhd trips
"The ecological fallacy consists in thinking that re
using public
invisible
invisible for groups necessarily hold for individu
observed
factor
constant1
0 range 0 Fallacy"
to 1
city
transit
hhd income factor
source:
"Ecological
Inference and the Ecological
0 to 1 of California,
A
64% $
57,748
0.5 A. Freedman
80000 constant2
0 range
David
(Department of Statistics,
University
A
61% $
58,767
0.5
80000
3 range
high
to 1 & Behaviora
Prepared
for theconstant3
International Encyclopedia
of the
Social
A
52% $
61,738
0.5
80000
Technical
Report No. 549, 15 October 1999.
pdf0.5
file accessed
A
58% $
59,687
80000 Jan. 13, 2002, http://www.stanford.edu/class/ed260/f
A
69% $
55,707
0.5
80000
A
67% $
56,473
0.5
80000
A
75% $
53,775
0.5
80000
A
57% $
60,074
0.5
80000
A
81% $
51,761
0.5
80000
A
67% $
56,494
0.5
80000
B
66% $
51,894
0.4
75000
B
57% $
55,087
0.4
75000
Scatterplot; unit of analysis: in
B
45% $
59,280
0.4
75000
one seesunit
patterns
in the indivi
Scatterplot;
of analysis:
indiv
B
50% $
57,581
0.4
75000
This
example:
B
53% $
56,602
0.4
75000
$65,000
B
53% $
56,337
Is there
a relationship
between0.4use75000
of
$65,000
B
68%
$
51,028
0.4
75000
public transit and hhd income?
B
60% $
0.4
75000
$60,000
$60,000
Aggregate
data
(unit53,913
of analysis:
B
63% $
52,819
0.4
75000
city):
positive
relationship
B
67% $
51,662
0.4
75000
$55,000
$55,000
Individual
data
(unit 53,907
of analysis:
C
46% $
0.3 hhd):
70000
C
43% $
54,781
0.3
70000
negative
relationship
$50,000
$50,000
C
57% $
49,885
0.3
70000
C
58% $
49,842
0.3
70000
$45,000
DANGER:
making
an
ecological
C
38% $
56,742
0.3
70000
$45,000
fallacy
-- using
data
C
51% $ aggregate
52,198
0.3to 70000
$40,000
C
39% $
56,205
0.3
70000
0%
10%
20%
30%
$40,000
C
40% $
56,016
0.3
70000
0%
10%
20%
30%
Percent of hhd trips
C
35% $
57,779
0.3
70000
Percent of hhd trips by p
C
42% $
55,416
0.3
70000
D
48% $
51,254
0.2
68000
$57,000
$56,500
$56,000
$55,500
$55,000
E
E
E
F
F
F
F
F
29%
20%
19%
3%
6%
17%
30%
1%
$
$
$
$
$
$
$
$
52,822
55,841
56,426
58,852
57,908
53,971
49,420
59,539
0.1
0.1
0.1
0
0
0
0
0
63000
63000
63000
60000
60000
60000
60000
60000
56
57
58
59
60
F
F
F
F
F
3%
13%
20%
10%
8%
$
$
$
$
$
59,072
55,532
53,016
56,452
57,257
0
0
0
0
0
60000
60000
60000
60000
60000
percent of
hhd trips
using public
transit
65%
58%
45%
36%
19%
11%
$
$
$
$
$
$
AGGREGATED DATA
CITY
A
B
C
D
E
F
hhd income
57,222
54,620
54,277
55,402
56,434
56,102
mean annual hhd inc
48
49
50
51
52
53
54
55
$55,000
$54,500
$54,000
0%
10%
20%
Percent of hhd trips
correlation
-0.12
s in thinking that relationships

ly hold for individuals..."
ogical Fallacy"
s, University of California, Berkeley)
ange
high
to 1 & Behavioral Sciences,
of the
Social
stanford.edu/class/ed260/freedman549.pdf
lot; unit of analysis: individual household

esunit
patterns
in the individual
by cities
of analysis:
individualdata
household
30%
30%
40%
40%
50%
50%
60%
60%
70%
70%
Percent of hhd trips by public transit

80%
80%
Scatterplot; unit of analysis: cities

with more transit have higher income, but...
90%
90%
20%
30%
40%
50%
60%
70%
Multiplier
Multiplier: the relationship between local and export employmen

R.O.W.
(rest of world)
Twin Peaks
Revenues from Timber
Local
Services
Timber
Export Jobs (Basic) + Non-Export Jobs (NonBasic) = TOTAL JOBS
Imagine a simple economy of Twin Peaks, an isolated timber economy

Service Jobs
Timber Jobs (export)
Total Jobs
2,000
1,000
3,000
Mutliplier
3.0
So, can use a multipler to estimate the impact of a change in basic em

(Up or down) on total employment.
[assumes a simple, linear relationship]
Page 37
Multiplier
Change in Basic Employment

Change in Total Employment
100
300
500
1500
1200
3600
-100
-300
-500
-1500
Page 38
Multiplier
port employment
TOTAL JOBS
d timber economy:
ange in basic employment
Page 39
Location Quotients
Location Quotient (LQ) - a measure a relative local employment concentration in a s

Used to also estimate local vs. export (I.e., non-basic vs. basic) employment
(Can also use to help understand the level of industrial diversification in a local eco
EXAMPLE: You are given data for the town of Icarus in the far-away country of Daedalus
Icarus Daedalus
Population
20,000 2,500,000
Annual Gross Per Capita Income
$17,000
$25,000
Total Employment
10,000 1,000,000
Agricultural Emp.
1,000
50,000
Govt Employment
300
100,000
Private Service Emp.
4,000
500,000
Airplane Manufacturing Emp.
700
10,000
Non-airplane Manufacturing Emp.
1,000
200,000
All Other Employment
3,000
140,000
Based upon this data, which sectors of the Icarus economy likely are exporting goods or s
Estimate the share of each sector's employment that could be due to exports
(and explain how you did these estimates and the name of the technique(s) you used).
Finally, explain why these estimates may not be accurate.
Icarus
Population
20,000
Annual Gross Per Capita Income
$17,000
Total Employment
10,000
Agricultural Emp.
1,000
Govt Employment
300
4,000
700
1,000
3,000
Percent of Total Employment

Daedalus
Icarus Daedalus
2,500,000
$25,000
1,000,000
100%
100%
50,000
10%
5%
100,000
3%
10%
500,000
40%
50%
10,000
7%
1%
200,000
10%
20%
140,000
30%
14%
Take the locatio quotients to estimate amount of export jobs (if any):
TOTAL JOBS = LOCAL JOBS + EXPORT JOBS
Page 40
Location Quotients
Total Jobs Local Jobs Export Jobs

Total Employment
10,000
Agricultural Emp.
1,000
500
500
Govt Employment
300
300
4,000
4,000
700
100
600
1,000
1,000
3,000
1,400
1,600
TOTAL
10,000
7,300
2,700
5,000
4,500
4,000
3,500
3,000
Export Jobs
2,500
Local Jobs
2,000
1,500
1,000
500
-
Agricultural Emp.
Page 41
Non-airplane
Manufacturing Emp.
Location Quotients
t concentration in a specific sector

employment
ication in a local economy)
ei
LQ e
Ei
E
ay country of Daedalus.
ei = local employment in sector i

e = total local employment
Ei = national employment in sector i
E = total national employment
e exporting goods or services outside the community?

o exports
nique(s) you used).
LOCATION QUOTIENT: Ratio of Local to National Percentages
2.00 export industry

0.30
0.80
0.50
Page 42
Location Quotients
xport Jobs
ocal Jobs
on-airplane
ufacturing Emp.
Page 43
Gravity Model
Gravity Model
Using Newton's Universal Law of Gravitation for social processes
m1m2
F G( 2 )
r
m1
where F = force of gravity between m1 and m2

G is the universal constant
r is the distance between m1 and m2
To convert to society:
F becomes the interaction between m1 and m2 (e.g., traffic, trade, etc.
m1 and m2 become population (or employment, or GDP, etc.)
r is distance
G is a constant
Example:
G
0.001
0.001
0.001
0.001
0.001
Population
m1
10,000
20,000
20,000
10,000
10,000
m2
20,000
20,000
40,000
20,000
20,000
Page 44
Distance Interaction (e.g., car trip

r
F
20
500
20
1,000
20
2,000
10
2,000
5
8,000
Gravity Model
cial processes
m1
m2
g., traffic, trade, etc.)

GDP, etc.)
raction (e.g., car trips/day)
Page 45
3 Growth Rates
Three Growth Rates
Pn P0 (1nr)
Simple, Linear Growth (e.g., average annual growth)
Discrete Compounded Growth (e.g., annual)
Pn P0 (1r)
Compounded continuously (with exponent)

[almost the same results as discrete]
Pn P0e
rn
where e = 2.7183....
remember than ln (e) = 1
A Comparison of these Three Growth Patterns
Po
100
100
100
100
100
100
100
100
100
100
n
0
1
2
3
4
5
6
7
8
9
r
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
Linear
Pn
100
105
110
115
120
125
130
135
140
145
Page 46
Discrete
Compounded
Pn
100.0
105.0
110.3
115.8
121.6
127.6
134.0
140.7
147.7
155.1
Continuously
Compounded
Pn
100.0
105.1
110.5
116.2
122.1
128.4
135.0
141.9
149.2
156.8
3 Growth Rates
100
100
100
100
100
100
100
100
100
10
15
20
25
30
40
50
75
100
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
150
175
200
225
250
300
350
475
600
162.9
207.9
265.3
338.6
432.2
704.0
1146.7
3883.3
13150.1
164.9
211.7
271.8
349.0
448.2
738.9
1218.2
4252.1
14841.3
16000.0
14000.0
12000.0
10000.0
8000.0
6000.0
4000.0
2000.0
0.0
0
20
40
Page 47
60
80
3 Growth Rates
Page 48
3 Growth Rates
Continuously Compounded
Discrete Compounded
Linear
100
120
Page 49
Cost-benefit
Cost-Benefit Thinking
TWO CHALLENGES:
1. how to sum up all the costs and benefits.
2. How to deal with time: discounting. --->>> time preferences.
Present value (PV) = B(t) / (1+r)t
where B(t) is the benefit in year t, r is the discount rate.
Net Present Value (NPV) = (B(t) - C(t)) / (1+r)t
where B is benefits and C is costs.
why is money worth less in the future?

1 people are impatient (and mortal)
2 opportunity cost of investing the capital elsewhere.
The argument for discounting is referred to as the 'marginal productivity of capital'
AND THE TRICK IS TO INCLUDE ENVIRONMENTAL COSTS AND BENEFITS. [99]
if (B(t) - C(t)E(t)) * (1+r)t > 0 , then the project is a net good project.
The Problems with Discounting for the Environment
a way to shift heavy costs to future generations.
note: it is hard to shift capital costs to future generations, since lenders want payba
1 actual damage may be far larger than the discounted value.

2 long-term benefits are also not strongly valued (even though today's action
3 will lead to greater exhaustion of exhaustible resources, esp. with a high d
However: "There is, in fact, no unique relationship between high discount rates an
How to select a discount rate: simply the rate of economic growth for a nation?
Taking sustainability into account:
Page 50
Cost-benefit
EX: "require that any environmental damage be compensated by projects specifica
note how the r can really change the outcome, especially if costs and benefits patte
EXAMPLE
t
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Benefit Cost
Net Benefit
B(t)
C(t)
B(t) - C(t)
0 1,000,000 -1,000,000
100,000
100,000
0
110,000
100,000
10,000
120,000
100,000
20,000
130,000
100,000
30,000
140,000
100,000
40,000
150,000
100,000
50,000
160,000
100,000
60,000
170,000
100,000
70,000
180,000
100,000
80,000
190,000
100,000
90,000
200,000
100,000
100,000
210,000
100,000
110,000
220,000
100,000
120,000
230,000
100,000
130,000
240,000
100,000
140,000
250,000
100,000
150,000
260,000
100,000
160,000
270,000
100,000
170,000
280,000
100,000
180,000
290,000
100,000
190,000
discoun discoun
t rate
t rate
r
(1+r)^t
0.02
1.00
0.02
1.02
0.02
1.04
0.02
1.06
0.02
1.08
0.02
1.10
0.02
1.13
0.02
1.15
0.02
1.17
0.02
1.20
0.02
1.22
0.02
1.24
0.02
1.27
0.02
1.29
0.02
1.32
0.02
1.35
0.02
1.37
0.02
1.40
0.02
1.43
0.02
1.46
0.02
1.49
Compare front-loading and backloading costs

and changing discount rates
1,500,000
Page 51
Cost-benefit
Benefit
1,000,000
Cost
Cumulative Net Present Value (NPV)
Net Benefit
500,000
0
1
10
11
12
13
-500,000
the year when the green line crosses over

axis (where y=0) is the year when the cumu
impact shifts from a net cost to a net benef
-1,000,000
-1,500,000
Year
Page 52
Cost-benefit
(Bt Ct )
NPV
t
t 0 (1 r)
n
preferences.
lsewhere.
Bt benefits in year t
Ct costs in year t
t year
NPV net present value (benefits adjusted for cost)
r discount rate (e.g.,6% per year or 0.06)
marginal productivity of capital' argument, the use of the word 'marginal' indicating that it is
COSTS AND BENEFITS. [99]

is a net good project.
rations, since lenders want paybacks. e.g., 30 year loans. but it is easier to shift non-mone
discounted value.
alued (even though today's actions are required for those 50 years from now to enjoy them).
ble resources, esp. with a high discount rate.
p between high discount rates and environmental deterioration." [103]
conomic growth for a nation?
the interest rate?
Page 53
[104]
Cost-benefit
ompensated by projects specifically designed to improve the environment." [106]
ecially if costs and benefits patterns vary over time.
(see graph).
Net Benefit
discounted for
present value
Cumulative Net Present Value (NPV)
(B(t) - C(t)) / (1+r)t (B(t) - C(t)) / (1+r)t
-1,000,000
-1,000,000
0
-1,000,000
9,612
-990,388
18,846
-971,542
27,715
-943,827
36,229
-907,597
44,399
-863,199
52,234
-810,965
59,744
-751,221
66,940
-684,280
73,831
-610,449
80,426
-530,023
86,734
-443,288
92,764
-350,525
98,524
-252,001
104,022
-147,979
109,267
-38,712
114,266
75,554
119,027
194,581
123,558
318,139
127,865
446,003
Page 54
Cost-benefit
Net Present Value (NPV)
13
14
15
16
17
18
19
20
21
the year when the green line crosses over the x

axis (where y=0) is the year when the cumulative
impact shifts from a net cost to a net benefit.
Page 55
Cost-benefit
nal' indicating that it is the productivity of additional units of capital that is relevant. [99]
sier to shift non-monetary costs to the future, since the lenders are around to complain! the
m now to enjoy them). ie., they should not be discounted like capital.
Page 56
Cost-benefit
ent." [106]
Page 57
Cost-benefit
Page 58
Cost-benefit
is relevant. [99]
und to complain! they don't have a contractual agr
Page 59
gini
0.386
RANGE: 0 (PERFECT EQUALITY; 1 PERFECT INEQUALITY)
n
20
Person "i"
Income calculated
calculated
calculated
"i"
X(i)
CULULATIVE
x(i)
X(i)
x(i)*i
1
1,000
0.003
0.003
0.00
2
3,000
0.009
0.013
0.02
3
4,000
0.013
0.025
0.04
4
5,000
0.016
0.041
0.06
5
6,000
0.019
0.059
0.09
6
8,000
0.025
0.084
0.15
7
8,000
0.025
0.109
0.18
8
9,000
0.028
0.138
0.23
9
11,000
0.034
0.172
0.31
10
12,000
0.038
0.209
0.38
11
14,000
0.044
0.253
0.48
12
17,000
0.053
0.306
0.64
13
19,000
0.059
0.366
0.77
14
21,000
0.066
0.431
0.92
15
23,000
0.072
0.503
1.08
16
27,000
0.084
0.588
1.35
17
29,000
0.091
0.678
1.54
18
32,000
0.100
0.778
1.80
19
33,000
0.103
0.881
1.96
20
38,000
0.119
1.000
2.38
SUM
320000
1
14.36
mean
0.05
Insert income amounts for each of
the 20 people here -- be sure to
arrange from LOW to HIGH
Do NOT enter data in any of the
other columns -- those are
calculated.
Try entering both a fairly equal
income distribution -- and then try a
broadly unequal one.
GINI COEFFICENT
CUMULATIVE X
1.000
LINE OF EQUALITY
0.900
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.100
0.000
10
the LORENZ CURVE -- see how

the curve deviates from the line
of equality as the gini coefficient
source of formula and text: U.S. Census Bureau. The

Changing Shape of t he Nations Income Distribution, 19471998, Curren tPopulationReport, By Arthur F. Jones Jr.and
Daniel H. Weinberg, (Issued June 2000)
http://www.census.gov/prod/2000pubs/p60-204.pdf
MEASURES OF
INEQUALITY/DISPARITY:
how to calculate a Gini
Coefficient
GINI COEFFICENT
CUMULATIVE X
COEFFICENT
MULATIVE X
1.000
OF EQUALITY
GINI COEFFICENT
CUMULATIVE X
1.000
LINE OF EQUALITY
LINE OF EQUALITY
0.900
0.900
0.800
0.800
0.700
0.700
0.600
0.600
0.500
0.500
0.400
0.400
0.300
0.300
0.200
0.200
0.100
0.100
CURVE -- see how

viates from the line
s the gini coefficient
13
11
19
17
15
13
11
20
0.000
1
0.000
15
I COEFFICENT
UMULATIVE X
NE OF EQUALITY
19
17
15
13

Basic Tools

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Basic Tools

Încărcat de

Drepturi de autor:

Formate disponibile

Basic Quantitative Tools (Prof.

an interval variable divided

ANOVA (Analysis of Variance)

two interval variables

How many total jobs are dependent on basic (export-based) jobs?

population levels over time

How do we compare costs and benefits (e.g., of a project) over time?

calculate a confidence interval (with interval data)

Data Hhd Income

lower end of confidence interval

set the confidence level (2-tail)

close to the population mean?

calculate a confidence interval

lower end of confidence interval

set the confidence level (2-tail)

e comes close to the population mean?

data (mean, std dev., n)

f confidence interval 41,498

calculate a minimum sample size need to achieve a specific confi

set the confidence level (2-tail)

sample size needed

a specific confidence interval range

close to the population mean?

here is the formula to calculate a confidence

solving for n (sample size)

ower end of confidence interval 41,500

leads to this equation (so, to estima

calculate a confidence interval

equation (so, to estimate sample

calculate a confidence interval using proportions (nominal data)

sample size (n)

lower end of confidence interval

does the distribution of ou

CHI-SQUARE TEST (EXCEL: FUNCTION)

PREDICTED/EXPECTED (based on mutiplying row and column to

Chi-square test (Calculated by Excel): "CHITEST"

(probability of this sample outcome if no difference in population)

Difference between predicted and actual

distribution of outcomes (observed) significantly differ

and column totals)

probability of this outcome if no difference in population

standardized sample differences (t

bout 100 cases)

r various degrees of freedom (two-tail

the larger the sample size, the lower the

Small Standard Deviation

Larger Standard Deviation

t-Test: Two-Sample Assuming Equal Variances

t-Test: Two-Sample Assuming Equal Variances

Male IncomeFemale Income

t-Test: Two-Sample Assuming Equal Variances

Male IncomeFemale Income

r of the equation for t

A special case of the difference of means test

Percent of Residents Who Own a Car

City ResidentsSuburban Residents

for simplicity and conservatism, we could have

see Blalock, p. 234

The central question: does the differen

Alternative: the difference is due mere

Remember: generally if |t| >2 (i.e., if t < -2 or t >

cent of Residents Who Own a Car

on: does the difference found in the sample reflect

difference is due merely to random sample