Documente Academic
Documente Profesional
Documente Cultură
(EKN309)
CHAPTER 1
THE NATURE OF REGRESSION ANALYSIS
Regression as the main tool of econometrics (step 5).
NOW: the nature of this tool
(I. 3.2)
3) Interested in:
the dependence of personal consumption expenditure on after-tax or
disposable income
What to be estimated? slope coefficient ? What are the dependent and independent variables here?
Its functional form?
Causality from what to what?
Check the other examples listed in your book (Gujarati, 2003: 20-21)
Correlation analysis:
-
Regression analysis:
-
The explanatory variable(s): fixed values (in repeated sampling) that is, X
assumes the same values in various samples -.
Figure I.5: a steady upward trend as well as variability over the years;
the M1 time series is not stationary. (chapter 21).
Cross-section data:
-
Data on one or more variables collected at the same point in time; i.e.
HLFS, censuses, etc.
Table 1.1: US egg production: egg production and egg prices for the 50
states for 1990 AND 1991.
state
AL
AK
AZ
AR
CA
CO
CT
DE
(..)
Y1
2,206
0.7
73
3,620
7,472
788
1,029
168
(..)
Y2
2,186
0.7
74
3,737
7,444
873
948
164
(..)
X1
92.7
151.0
61.0
86.3
63.4
77.8
106.0
117.0
(..)
X2
91.4
149.0
56.0
91.8
58.4
73.0
104.0
113.0
(..)
state
MT
NE
NV
NH
NJ
NM
NY
NC
(..)
Y1
172
1,202
2.2
43
442
283
975
3,033
(..)
Y2
164
1,400
1.8
49
491
302
987
3,045
(..)
X1
68.0
50.3
53.9
109.0
85.0
74.0
68.1
82.8
(..)
X2
66.0
48.9
52.7
104.0
83.0
70.0
64.0
78.7
(..)
Pooled data:
-
The one in Table 1.1 (if both years are considered together);
A special type of pooled data where the same cross-sectional unit (i.e. A
family or a firm) is surveyed over time.
The reseracher should always keep in mind that the results of research are
only as good as the quality of data have no choice but to depend on the
available data:
The results of the research are unsatisfactory:
-
CHAPTER 2
TWO VARIABLE REGRESSION ANALYSIS:
SOME BASIC IDEAS
Chapter 1: the concept of regression in broad terms.
Chapter 2 and 3: introduction to the theory underlying the simplest possible
regression analysis the bivariate, or two-variable, regression:
where the dependent variable (the regressand) is related to a single
explanatory variable (the regressor).
Geometrically;
It is the locus of the conditional means of the dependent variable for the fixed
values of the explanatory variable(s).
More simply;
It is the curve connecting the means of the subpopulations of Y corresponding
to the given values of the regressor X.
See Figure 2.2: Population regression line (data of Table 2.1)
(. . )
. .
From now on the term linear regression will always mean a regression that is
linear in the parameters; the s (that is, the parameters are raised to the first
power only). It may or may not be linear in the explanatory variables, the Xs.
YES
NO
YES
NO
(. . )
= +
= 1 + 2 +
=
Since
itself:
= +
(. . )
= 0 (. . )
Thus, the assumption that the regression line passes through the conditional
means of Y implies that the conditional mean values of (conditional upon
the given Xs) are zero.
1. Vagueness of theory
ignorant or unsure about the other variables affecting Y; : as a substitute for
all the excluded or omitted variables from the model.
2. Unavailability of data
Even if we know what some of the excluded variables are, we may not have
quantitative information about them; captured in .
3. Core variables versus peripheral variables
The joint influence of all or some of the variables are so small and it is
meaningless to introduce them into the model explicitly; combined effect
being treated as a random variable, .
Even if we succeed in introducing all the relevant variables into the model,
some intrinsic randomness in individual Ys that cannot be explained; s
reflecting this randomness.
5. Poor proxy variables
Errors in measurement where data may not be measured accurately;
representing the errors of measurement.
6. Principle of parsimony
To keep the as simple as possible; representing all other variables.
7. Wrong functional form
Correct variables explaining a phenomenon but not sure about the functional
form. The scattergram: helpful if two-variable model is concerned.
= 1 + 2
(. . )
= 1 + 2 +
(. . )
TO SUM UP
= 1 + 2 +
. .
= 1 + 2 +
(. . )
WHY? Because more often we do not have data for all the population.
For = , we have one (sample) observation = see tables 2.4 and 2.5 .
= +
(. . )
= + (. . )
Granted that the SRF is but an approximation of the PRF, can we devise a
rule or method that will make this approximation as close as possible?
OR:
How should the SRF be constructed so that 1 is as close as possible to the
true 1 AND 2 is as close as possible to the true 2 even though we will
never know the true 1 2 ?
CHAPTER 3: procedures that tell us how to construct the SRF to mirror the PRF as
faithfully as possible.