Sunteți pe pagina 1din 30

02-04-2014

1
Dr. Mita Suthar
H.L. Institute of Commerce
T. Y. B. Com.
2013-2014
4
/
2
/
2
0
1
4
1
Gujarati, Damodar. 2004. Basic Econometrics. McGraw
Hill. 4/e. Or later.
Chapters: Introduction, 1, 2, 5, 6, 7, 8, 9, Appendix A
Gujarati, Damodar. 2011. Econometrics by example.
Palgrave Macmillan. 1/e.
Chapters: 1-3
4
/
2
/
2
0
1
4
2
02-04-2014
2
Mathematical
Economics &
Econometrics
Economics
Mathematics
Statistics
4
/
2
/
2
0
1
4
3
Mathematical
modeling of
economic theory
To define the explanatory
variables and precise
functional relation
Validation of
model using
statistical data
To understand whether
real data support
theoretical framework
Inference
To understand the
functional relation and
significance of
explanatory variables
Forecasting
To predict the value
of endogenous
variable
4
/
2
/
2
0
1
4
4
02-04-2014
3
Function
Variables, parameters & constants
Slope & intercept terms
Endogenous & exogenous vis--vis dependent
& independent variables
4
/
2
/
2
0
1
4
5
= + +

+ +

Two variable LRM


What happens when more than one factors cause
changes in the dependent variable?
Linearity in variables or parameters
Two components: deterministic and nonsystematic
or random
as the conditional mean (

)
Variables
Regression coefficients: Intercept, partial slope
Error term 4
/
2
/
2
0
1
4
6
02-04-2014
4
Random variable, though range may be specified
Four scales:
Ratio:
o Most of the variables fall in this category
o Ratio of two variables / value of a variable
o Distance between variables
o Ordering of variables
Interval:
o Such as time variable
Ordinal:
o Used for graded / classified variables
Nominal:
o Used for attributes, dummy variables, etc.
4
/
2
/
2
0
1
4
7
Nonrandom in CLRM
Fixed values in repeated sampling
Regression is conditional
Similar categories as those for Y-variable
Could be random
4
/
2
/
2
0
1
4
8
02-04-2014
5
Random
Omitted variables
Negligence
Lack of data
Unquantifiable variables
Captures randomness of human behaviour
Average effect is marginal
4
/
2
/
2
0
1
4
9
The error term contains all the factors explained by
other variables. Why not to include other
variables?
1. Vagueness of theory
2. Unavailability of data
3. Core variables versus peripheral variables
4. Intrinsic randomness in human vehavior
5. Poor proxy variables
6. Principle of parsimony
7. Wrong functional form
4
/
2
/
2
0
1
4
10
02-04-2014
6
Linearity in parameters
Random
Estimate values on the basis of sample data
4
/
2
/
2
0
1
4
11
4
/
2
/
2
0
1
4
12
Time
Series
Denoted as X
t
, Y
t
Annual, quarterly, monthly, etc.
High frequency
Problem of autocorrelation
May lack stationarity: mean and variance may vary systematically over
time
GDP of India from 1950-51 to 2012-13
Cross
Section
Denoted as X
i
, Y
i
Problem of heterogeneity, with observations having their own
characteristics
GDP of several countries during one year / quarter
Pooled
Combination of Time Series & Cross Section
The units surveyed may change over time
GDP of several countries over a period of time (the set of countries may be
the same or different, new countries may be added, etc.
Panel
Special category of pooled data
Data on the same cross-sectional unit
GDP of the same set of countries over a period of time
02-04-2014
7
Primary & secondary
Some secondary data sources
NSSO & CSO
Government departments
Reserve Bank of India
World Bank
International Monetary Fund
CMIE
4
/
2
/
2
0
1
4
13
4
/
2
/
2
0
1
4
14
02-04-2014
8
Why study sample data
Sampling techniques
Drawing inference for population on the basis
of sample study
PRF, PRL, SRF & SRL
4
/
2
/
2
0
1
4
15
Studies the relation between an explained or
endogenous and one or more explanatory variables
Statistical, not functional or deterministic, dependence
among variables
Random or stochastic variables
Does not imply causation: causality between
variables is determined by the theory
Regression versus correlation
Correlation measure the strength or degree of linear
association between two variables
Regression estimates or predict the average value of one
variable on the basis of the fixed values of other variables
4
/
2
/
2
0
1
4
16
02-04-2014
9
Example: Estimate the average consumption of
60 families in a locality
Population = 60 families
Analyze the relationship between the
Consumption expenditure of each family (Y) and
the level of income (X)
4
/
2
/
2
0
1
4
17
4
/
2
/
2
0
1
4
18
02-04-2014
10
4
/
2
/
2
0
1
4
19
Population Regression Line (PRL)
Shows the average, or mean, value of the dependent
variable corresponding to each value of the
independent variable, in the population as a whole
Since the PRL is approximately linear, mathematically
it can be expressed as
Which is called Population Regression Function (PRF)
Linear in parameters |
1
and|
2
) ( ) | (
i i
X f X Y E =
4
/
2
/
2
0
1
4
20
i i
X X Y E
2 1
) | ( | | + =
02-04-2014
11
How to express the deviation of a specific Y
i
around
its expected value?
Where the deviation u
i
is an unobservable random
variable taking positive or negative values known as
the stochastic disturbance (error term)
i i i
i i i
u X Y E Y
or
X Y E Y u
+ =
=
) | (
) | (
4
/
2
/
2
0
1
4
21
Taking the expected value of the PRF, we obtain the
following
0 ) | (
then
) | ( ) | ( since
) | ( ) | ( ) | (
) | ( )] | ( [ ) | (
=
=
+ =
+ =
i i
i i
i i i i
i i i i
X u E
X Y E X Y E
X u E X Y E X Y E
X u E X Y E E X Y E
4
/
2
/
2
0
1
4
22
02-04-2014
12
What happens when the population is unknown?
Too large?
Select a sample of this population
Repeated sampling: Different samples will provide
different sets of information
4
/
2
/
2
0
1
4
23
Selecting Random Samples from the population
4
/
2
/
2
0
1
4
24
80 100 120 140 160 180 200 220 240 260
Weekly Family 55 65 79 80 102 110 120 135 137 150
Consumption 60 70 84 93 107 115 136 137 145 152
65 74 90 95 110 120 140 140 155 175
70 80 94 103 116 130 144 152 165 178
75 85 98 108 118 135 145 157 175 180
88 113 125 140 160 189 185
115 162 191
Conditional
Mean of Y
(E(Y|X) 65 77 89 101 113 125 137 149 161 173
Sample I Sample II
02-04-2014
13
Selecting Random Samples from the population
4
/
2
/
2
0
1
4
25
4
/
2
/
2
0
1
4
26
02-04-2014
14
4
/
2
/
2
0
1
4
27
4
/
2
/
2
0
1
4
28
PRF
02-04-2014
15
Mathematically, this estimation is expressed as:
where = estimator of E(Y/X
i
) the estimator of the
population conditional mean
=estimator of |
1
=estimator of |
2
1

|
i i
X Y
2 1

| | + =
i
Y

|
4
/
2
/
2
0
1
4
29
Not all the sample data lie exactly on the respective
sample regression line. Hence, the stochastic
model is estimated:
where = estimator of u
i
i i i
u X Y

2 1
+ + = | |
i
u
4
/
2
/
2
0
1
4
30
02-04-2014
16
represents the difference between the actual Y values
and their estimated values from the sample
regression:
In the estimation process actual |
1
, |
2
and u are not
observed, but their proxies are observed
i i i
Y Y u

=
i
u
4
/
2
/
2
0
1
4
31
4
/
2
/
2
0
1
4
32
02-04-2014
17

: Estimators of | coefficients
Residual

: An estimator of error term

An estimator: The formula or rule to find the


values of the regression parameters
Estimate: Numerical value of the estimator
Estimation: The procedure to find estimate of
the regression parameters
4
/
2
/
2
0
1
4
33
1. Linear in parameters
2. Regressors (explanatory variables) are
nonstochastic (nonrandom, fixed) in repeated
sampling
3. Expected value of error term is zero

= 0
4
/
2
/
2
0
1
4
34
02-04-2014
18
3. Expected value of error term is zero

= 0
4
/
2
/
2
0
1
4
35
4. Homoscedasticity: Variance of each u
i
is constant

=
2
4
/
2
/
2
0
1
4
36
HOMOSCEDASTICITY HETEROSCEDASTICITY
02-04-2014
19
5. No autocorrelation between two error terms

= 0
4
/
2
/
2
0
1
4
37
NO
AUTOCORRELATION
+VE OR VE AUTOCORRELATION
6. No perfect multicollinearity: No perfectly linear
relationships among X variables, i.e., relations
like
2
= 2
3
+3
4
do not exist
7. No specification bias: Regression model is
correctly specified, number of observations n is
larger than the number of parameters to be
estimated k
8. u
i
~ N(0, o
2
)
4
/
2
/
2
0
1
4
38
02-04-2014
20
4
/
2
/
2
0
1
4
39
Linear: Linear functions of the
endogenous variable Y
Unbiased: In repeated
applications of the estimation
method, they tend to the true
value
Efficient / Best: Among linear,
unbiased estimators, the OLS
estimators have minimum variance
BLUE
Towards
4
/
2
/
2
0
1
4
40
02-04-2014
21
Part I: Individual Parameters Test
4
/
2
/
2
0
1
4
41
Similar to variance and standard deviation, that
measure the variability of a variable
Regression: standard deviation of an estimator
is known as Standard Error
Estimate of the variance (o
2
) of error term (u
i
):

2
=

2

=

2

Where, = ,
=
Standard Error of Regression (SER): o
4
/
2
/
2
0
1
4
42
02-04-2014
22
Standard deviation of Y (S
Y
) > SER
If X regressors (Xs) do not have (a significant) impact
on Y, then the above may not hold true
Then the regression exercise is futile and

is the best
estimate of Y
But, Xs are believed to explain the behavior of Y in a
superior way that cannot be explained by

alone
4
/
2
/
2
0
1
4
43
Normality assumption
Variance of the coefficients involves
2
as the
estimator of
2
Use t-distribution for the OLS estimators of the
regression coefficients
Although, the t-distribution ~ the normal
distribution in large samples
4
/
2
/
2
0
1
4
44
02-04-2014
23
Null hypothesis (H
0
): indicates insignificance
=

)
, with n-k degrees of freedom
Two ways to make decisions:
If

< 0.05 then reject H


0
OR
If

>

then reject H
0
, at a specified level of significance
(o), like 5%
Level of significance can change to 10 % or 1 % or any other
Probability values Exact level of significance of t values
Low p-value estimated regression coefficient is significant
4
/
2
/
2
0
1
4
45
P-value
Lower values, viz.,
lower than 0.05
Regression
coefficient
Estimated coefficient is
statistically significant
Explanatory
variable
Has significant impact
on regressand /
endogenous variable
4
/
2
/
2
0
1
4
46
02-04-2014
24
Two-sided (1-o) confidence interval is constructed as
Pr

= (1 )
Confidence interval will be

<

<

+
2

Shows the range of


values that has a 100(1-
o) (e.g., 95%) chance of
including the true
population parameter
value
Confidence coefficient
(CC) = 100(1-o)
4
/
2
/
2
0
1
4
47
Part II: Overall Significance of Model
4
/
2
/
2
0
1
4
48
02-04-2014
25
An overall measure of goodness of fit of the
estimated regression function
Proportion of the total variation in the
endogenous / dependent variable that is
explained by the explanatory variable
4
/
2
/
2
0
1
4
49

OR

which indicates differences from the mean / expected values


Squaring this and summing over all i s

2
=

2
+

2
= +
That is, the total variation of the actual Y values about
their sample mean (TSS) is equal to sum of the total
variation of the estimated Y values about their mean value
(ESS) and the sum of the squared residual values (RSS)
02-04-2014
26
Further,

1 =
2
+

2
=

=
(

)
2
(

)
2
=

2
OR

2
= 1

= 1

2
= 1

2
And, 0 s R
2
s 1
If the number of regressors increases, R-square
increases
Should one maximize R-square by adding more
regressors?
Adjusted R-square (

2
) measures R
2
adjusted for the
degrees of freedom

2
= 1 (1
2
)
1

If k>1, then

2
<R
2
As k increases

2
gets increasingly smaller
R
2
is always positive, but

2
can be negative
Used to compare two or more regression models
having the same endogenous variable
4
/
2
/
2
0
1
4
52
02-04-2014
27
H
0
: All slope coefficients in the regression model are
simultaneously zero
OR
All regressors have no impact on the endogenous
variable / regressand
F-test:
=


=

2

1
2

; where df = (k-1, n-k)


If

>

then reject H
0
, at a specified level
of significance (o), like 5%
4
/
2
/
2
0
1
4
53
Alternative representation of variables
4
/
2
/
2
0
1
4
54
02-04-2014
28
What happens if variables are qualitative?
Nominal scale variables
Can be quantified by creating dummy variables
Values:
0 absence of an attribute
1 presence of the same
Also known as indicator variables / categorical
variables / qualitative variables
CLRMcan be applied
4
/
2
/
2
0
1
4
55
If intercept term is included in the regression model
and if the qualitative variable had m categories, then
there can be only m-1 dummy variables
The category that is assigned value 0 is the reference /
benchmark / comparison category
Keeping track of reference categories is crucial for
interpretation
Interactive dummies
Logarithmic transformation of dummy variables is not
possible, if values are 0 and 1; but possible if values
are 10 and 5, and so on
Limit on the number of dummy variables especially if
the sample size is relatively small
4
/
2
/
2
0
1
4
56
02-04-2014
29
Gender categories
Political affiliations
Structural change in the economy
Deseasonalization / Seasonal adjustments:
Seasonal data in time series
4
/
2
/
2
0
1
4
57
Differential intercept dummy
Differential slope dummy
4
/
2
/
2
0
1
4
58
02-04-2014
30
4
/
2
/
2
0
1
4
59

S-ar putea să vă placă și