Sunteți pe pagina 1din 8

Regression Intuition: the effect of college education.

Suppose we want to estimate the effect of attending college for individuals


between 25 and 30 years old. For this we use a random sample of
individuals in the relevant population group.
Some of them have college education while the rest only complete high
school.
The information for the average income level for both groups in the sample
is as follows:
Have college Only completed Difference
education high school between
(sample mean) (sample mean) group means
Monthly
2,348
Income in 10,076 7,729
(1029.4)
US$

2
Have college Only completed Difference
education high school between
(sample mean) (sample mean) group means
Monthly
2,348
Income in 10,076 7,729
(1029.4)
US$

From this difference in means comparison, is it possible to


conclude that attending college has a causal effect on income?

Under what circumstances the comparison of the sample means


provides us with the causal effect of college education?

3
These are the characteristics for the two groups:

Only
Have college completed Difference
education high school between
(sample mean) (sample group means
mean)
0.19
Sex (1=male) 0.61 0.42
(0.08)
3
Early IQ test 76 73
(1.5)
Attended private 0.20
0.62 0.42
high school (0.2)

Are we comparing apples to apples?


Is there anything we can do if the data is not generated by a RCT?

4
Lets suppose the data is the following one:
n Income college sex iqtest private n income college sex iqtest private
1 6200 0 1 70 0 17 6900 1 1 79 0
2 9000 0 0 78 1 18 6700 0 1 79 0
3 5500 0 0 72 0 19 11000 0 1 64 1
4 9700 0 0 75 1 20 5250 0 0 70 0
5 9200 0 0 75 1 21 5250 0 0 70 0
6 10100 0 0 78 1 22 10800 1 0 78 1
7 11700 1 0 78 1 23 5250 0 0 70 0
8 7000 1 1 79 0 24 10900 0 1 64 1
9 5600 0 0 72 0 25 13500 0 1 86 1
10 5300 1 0 70 0 26 6100 0 1 70 0
11 6800 1 1 70 0 27 13900 1 1 86 1
12 6600 0 1 79 0 28 9800 1 0 75 1
13 5700 0 0 72 0 29 12000 1 1 64 1
14 14200 1 1 86 1 30 14500 1 1 86 1
15 5900 1 0 72 0 31 12200 1 1 64 1
16 9300 0 0 75 1 32 6000 0 1 70 0

5
Now the same data again but reordered
income college sex iqtest private Income college sex iqtest private
G1 14500 1 1 86 1 G5 7000 1 1 79 0
13900 1 1 86 1 6900 1 1 79 0
14200 1 1 86 1 6600 0 1 79 0
13500 0 1 86 1 6700 0 1 79 0
G2 12200 1 1 64 1 G6 6800 1 1 70 0
12000 1 1 64 1 6200 0 1 70 0
11000 0 1 64 1 6100 0 1 70 0
10900 0 1 64 1 6000 0 1 70 0
G3 10100 0 0 78 1 G7 5900 1 0 72 0
10800 1 0 78 1 5700 0 0 72 0
11700 1 0 78 1 5500 0 0 72 0
9000 0 0 78 1 5600 0 0 72 0
G4 9700 0 0 75 1 G8 5300 1 0 70 0
9800 1 0 75 1 5250 0 0 70 0
9300 0 0 75 1 5250 0 0 70 0
9200 0 0 75 1 5250 0 0 70 0

Can we now compare apples to apples?


Can you use this table to estimate the effect of college? Any idea?
6
Regression Analysis: It is basically comparing apples to apples!!!

When we first compare income only among the comparable, and


then estimate a weighted average effect, in essence we are
performing a regression analysis!!!

Intuition 1: in regression analysis we estimate the average effect of


the variable of interest keeping all other factors constant (it is a
ceteris paribus concept).

Intuition 2: in regression analysis we are estimating the effect of


college on income, once we net out the effect that the other
variables have on college.
7
What are the ingredients of a regression?

1. The outcome variable: A.K.A. the dependent variable: in our case monthly income

2. The treatment variable, in this example it is a dummy variable which equals one if
the individual attended college (and zero if not).

3. A set of control variables, this are all the other factors we want to keep constant in
our estimation: sex, IQ, private high school.

The regression of interest can be written as a linear equation:

= + + 1 + 2 + 3 +

We will use STATA for regression estimation.


8
Interpreting the parameters in the regression equation:

= + + 1 + 2 + 3 +

Is the intercept
Is the causal effect of interest
Is the effect of the j control variable included
Is the error term

How are these parameters estimated?


Intuitively: They are estimated to make the predicted income as close as
possible to the observed one.

S-ar putea să vă placă și