Documente Academic
Documente Profesional
Documente Cultură
for 3 credit
Syllabus Hours (Total - 42)
4 Regression 6
Statistical verses Deterministic Relationships, Regression verses Causation; Two
variable Regression Analysis; Population Regression Function (PRG), Stochastic
specification of PRF; The Significance of the Stochastic Term; stochastic
disturbance Term; the sample regression Function (SRF); Method of Ordinary Least
Squares; Properties of Least Square Estimators: The Gauss-Markov Theorem,
Coefficient of determination r2 : A Measure of “goodness of fit”; Monto Carlo
Experiments
5 Classical Normal Linear Regression Mode (CNLRM) 6
The Probability distribution of Disturbances (meu); Normality
Assumption, Method of Maximum Likelihood
Multiple regression Analysis: The Problem of estimation; The
problem of Inference, Cobb-Douglas Production function;
Polynomial Regression Model; Testing for structural or Parametric
stability of regression Models; the Chow test
1
6 Presentation on Application of Mathematics, Statistics, Operational 6 hrs.
Research Computer Science or any other related subject to discuss
any Aspect of Economics
3. Review of Statistics 14
3.4 Basic Ingredients of an Empirical study- Formulating a Model; Gathering data
Descriptive Statistics and its use in Business- Measure of Central Tendency: AM,
GM and HM, Median, Mode, Dispersion, Range, Quartile, standard Deviation,
Skewness, Kurtosis,
3.5 Probability - Discrete and Continuous; Probability Distribution: Binomial and
Poison distribution
3.6 Sampling techniques, Estimation and Hypothesis Testing, Interpreting the
results
Mid semester
4 Regression 8 Hours
.
2
Theorem, Coefficient of determination r2 : A Measure of “goodness of fit”;
Monto Carlo Experiments
5 Classical Normal Linear Regression Mode 4 Hours
. (CNLRM)
Suggested books
Index
S.N. Subject Page no
1 First Note 3
3
Introduction
In the recent times, Mathematical Economics and Econometrics have attracted attention of
engineering students. Two of the important reasons are need to understand of economic forces
and develop models at Micro level and secondly, understanding economic environment at
Macro level. Some of them like to explore other disciplines where they can use their basic
analytical reasoning and vast knowledge of mathematics, statistics and other similar subjects.
1. Mathematical Economics
Mathematical economics is not any branch of Economics as Monetary Economics and International
Economics but just application of Mathematics to the purely theoretical aspects of economic
analysis. It mainly expresses economic theory in mathematical form. It does not deal with
measurement.
2. Econometrics
As the word ‘Metric’ refers to measurement, Econometrics in simple words means measurement of
economicdate to verify an Economic Theory.It deals with the study of empirical observations for
estimation, inference and forecasting.
Econometrics may be defined as the quantitative analysis of actual economic phenomena based on
the concurrent development of theory and observation, related by appropriate methods of
inference (P.A. Samuelson T.C. Koopmans and J.R.N. Stone, “Report of the Evaluation Committee for
Econometrica,”Econometrica, 22(2)1954 pp. 141-146).
Econometrics may be defined as the social science in which the tools of economic theory,
mathematics, and statistical inferences are applied to the analysis of economic phenomena ( Arthur
S. Golberger, Econometric Theory, John Wiley & Sons, New York, 1964, p.1).
As empirical studies and theoretical analysis are often complementary and mutual reinforcing,
knowledge of both Mathematical Economics and Econometrics are important but knowledge of
Mathematical Economics may be considered as the more basic of the two to undertake
Econometrics study. Hence, knowledge of Mathematical Economics is useful not only for those who
are interested in theoretical Economics, but also for those seeking a foundation for the pursuit of
Econometric studies.
In the beginning, statistics was used for Econometric studies. However, over the years other similar
branches as operational research game theory, fuzzy logic etc. are being used for Mathematical and
Econometrics studies.
4
But in stochastic relationship, Randomness is involved.
The disturbance term, ‘µ’ represents all those variables that are omitted from the model
but that collectively affect, ‘Y’.
As we have studied that D inversely related to Price of the product when other things as
taste, fashion, price of the substitute goods, complimentary goods remain same. But due
to paucity of time or data, we may not study that all other variables to remain same or
not. So, we cannot say that
Broadly, traditional econometric methodology proceeds along with the following lines:
i. Statement of theory
It is very important to discuss what the exiting information/ theory says.
As Keynes stated:
The fundamental psychological law….is that men (women) are disposed, as a rule and on
average, to increase their consumption as their income increases, but not as much as the
increase in their income (John Maynard Keynes, The general Theory of Employment,
Interst and Money, Harcourt Brace Jovanovich, New York, 1936, p. 36).
In short, Keynes postulated that the marginal propensity to consume (MPC), the rate of
change of consumption for a unit change in come is greater than zero but less than one.
Though Keynes did not discuss its Mathematical form, but that canbe discussed as a
Y= α +βX ; 0<β<1
Where
5
Y = consumption expenditure and X = income
α and β are parameters of the model; the intercept and slope coefficients.
The slope coefficient β measures the MPC. The equation is called consumption function in
Economics and it states that consumption is linearly related to income.
Y= α + βX + µ
Where µ is known as disturbance, or error, the term is a stochastic variable that has well
defined probabilistic properties. The disturbance term µ may well represented all those
factors that affect consumption but are not taken into account explicitly.
As we have data, the next task would to estimate the parameters of the consumption
function. The numerical estimates of the parameters give empirical content to the
consumption function. Through statistical technique of regression analysis is used to
estimate.
The hat on the Y indicates that it is an estimate. Value of β is .63, suggest that for the
given data during the sample period, an increase in real income of one rupee led, led
on average, to an increase of about 63 paisa in real consumption expenditure. We say
on average because the relationship between consumption and income is inexact.
Assuming that the fitted model is reasonably good approximation of reality, we have to
develop theory or hypothesis that is not verifiable by appeal to empirical evidence
may not be admissible as part of scientific enquiry (Milton Friedman, ‘The
Methodology of Positive Economics’, Essay in Positive Economics, University of Chicago
Press, Chicago, 1953).
6
vii. Forecasting or Prediction
If the chosen model does not refute the hypothesis or theory under consideration, we may
use it to predict the future value(s) of the dependent or forecast, Y on the basis of the
known or expected future value(s)of the explanatory, or predictor, variable X.
Suppose we have estimated consumption function. Suppose further that the government
believes that consumer expenditure of about 2500 thousand crore rupees (in
1999-2000 prices) will increase the required employment level in the country. Through
MPC calculated, the government will be able to find out that income need to be
increased by what rate.
i. What purpose does it have? What economic decision does it help with?
ii. Is there any evidence being presented that allows me to evaluate its quality compared to
alternative theories or models?
The permanent income hypothesis was formulated by the Nobel Prize winning economist
Milton Friedman in 1957. The hypothesis implies that changes in consumption behavior are
not predictable, because they are based on individual expectations. This has broad
implications concerning economic policy. Under this theory, even if economic policies are
successful in increasing income in the economy, the policies may not kick off a multiplier
effect from increased consumer spending. Rather, the theory predicts there will not be an
uptick in consumer spending until workers reform expectations about their future incomes.
(source: http://www.investopedia.com/terms/p/permanent-income-hypothesis.asp dated
22/2/2016)
Milton Friedman was an American economist who received the 1976 Nobel Memorial Prize
in Economic Sciences for his research on consumption analysis, monetary history and
theory and the complexity of stabilization policy. Wikipedia
Born: July 31, 1912, Brooklyn, New York City, New York, United States
Died: November 16, 2006, San Francisco, California, United States
Influenced by: John Maynard Keynes, F riedrich Hayek, more
Influenced: Margaret Thatcher, G
ary Becker, Mart Laar, more
7
ii. LIFE CYCLE PERMANENT INCOME HYPOTHESIS of ROBERT HALL
(Source_ Hall Robert E. (1978), “Stochastic Implications of the Life Cycle – Permanent
Income Hypothesis: Theory and evidence” Journal of Political Economy, vol86, no. 6 pp
971-987)
Robert Ernest "Bob" Hall (born August 13, 1943) is an American economist and a Robert
and Carole McNeil Senior Fellow at Stanford University's Hoover Institution. He is generally
considered a macroeconomist, but he describes himself as an "applied economist".[1]
Bob Hall received a BA in Economics at the University of California, Berkeley and a PhD in
Economics from MIT for thesis titled Essays on the Theory of Wealth under the supervision
of Robert Solow. He is a member of the Hoover Institution, the National Academy of
Sciences, a fellow at both American Academy of Arts and Sciences and the Econometric
Society, and a member of the NBER, where he is the program director of the business cycle
dating committee. Hall served as President of the American Economic Association in 2010.
Discrete data/ variable can be further categorized as either nominal or ordinal:
● Nominal variables are variables that have two or more categories, but which do not
have an intrinsic order. For example, a real estate agent could classify their types of
property into distinct categories such as houses, condos, co-ops or bungalows. So
"type of property" is a nominal variable with 4 categories called houses, condos,
8
co-ops and bungalows. Of note, the different categories of a nominal variable can also
be referred to as groups or levels of the nominal variable. Another example of a
nominal variable would be classifying where people live in the USA by state. In this
case there will be many more levels of the nominal variable (50 in fact). Dichotomous
is a extreme case of nominal data which have only two categories or levels. For
example, if we were looking at gender, we would most probably categorize somebody
as either "male" or "female". This is an example of a dichotomous variable (and also a
nominal variable). Another example might be if we asked a person if they owned a
mobile phone. Here, we may categorise mobile phone ownership as either "Yes" or
"No". In the real estate agent example, if type of property had been classified as either
residential or commercial then "type of property" would be a dichotomous variable.
● Ordinal variables are variables that have two or more categories just like nominal
variables only the categories can also be ordered or ranked. So if you asked someone
if they liked the policies of the Democratic Party and they could answer either "Not
very much", "They are OK" or "Yes, a lot" then you have an ordinal variable. Why?
Because you have 3 categories, namely "Not very much", "They are OK" and "Yes, a
lot" and you can rank them from the most positive (Yes, a lot), to the middle response
(They are OK), to the least positive (Not very much). However, whilst we can rank
the levels, we cannot place a "value" to them; we cannot say that "They are OK" is
twice as positive as "Not very much" for example.
● Interval variables are variables for which their central characteristic is that they can
be measured along a continuum and they have a numerical value (for example,
temperature measured in degrees Celsius or Fahrenheit). So the difference between
20C and 30C is the same as 30C to 40C. However, temperature measured in degrees
Celsius or Fahrenheit is NOT a ratio variable.
● Ratio variables are interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature measured in
degrees Celsius or Fahrenheit is not a ratio variable because 0C does not mean there is
no temperature. However, temperature measured in Kelvin is a ratio variable as 0
Kelvin (often called absolute zero) indicates that there is no temperature whatsoever.
Other examples of ratio variables include height, mass, distance and many more. The
name "ratio" reflects the fact that you can use the ratio of measurements. So, for
example, a distance of ten metres is twice the distance of 5 metres.
Nomina
Can be computed Ordinal Interval Ratio
l
Frequency distribution. Yes Yes Yes Yes
Median and
No Yes Yes Yes
percentiles.
Add or subtract. No No Yes Yes
9
Mean, standard
deviation, standard No No Yes Yes
error of the mean.
Ratio, or coefficient of
No No No Yes
variation.
Data series –
● If someone else has collected, it is called secondary data. Must give source of the
data.
● If data is not available than collect data which is called as primary data Primary data
can be collected as any of the following form:
● Cross Sectional Data: In medical or social research information is collected from a
population or sample.
● Time Series Data: It is sequence of data points, typically consisting of successive
measurements made over a time interval.
● Pooled Data: Both cross-sectional and time series but sample is different
● Panel data: With same sample a cross sectional as well as time series data
Topic-2: Sampling
Census When you enumerate all the you can draw sample from
units- it is called Census- which the census
is time taking and needs a lot of
money.
Sample - The data sample may it is a subset of a population;
be drawn from a
population without
replacement,
or with replacement, it is a multi-subset
each individual Type of sampling
member of the 1. Simple random sample-
Random or population has a A simple random
Probability Sample known, non-zero sample is a subset of a
chance of being statistical population in
selected as part of the which each member of the
sample subset has an equal
probability of being
chosen. An example of
asimple random
sample would be the
names of 25 employees
being chosen out of a hat
from a company of 250
employees.
10
2. 3. Stratified random 4. Cluster Random sampling-
Systematic sampling- With cluster sampling, the
random A stratified random researcher divides the population
sample- sample is a random into separate groups,
Systematic sample in which members of called clusters. Then, a simple
sampling is a the population are first divided random sample of c lusters is
type of into strata, then selected from the population.
probabilitysampli are randomly selected to be
ng method in a part of thesample.
which sample me
mbers from a
larger population
are selected
according to
a random starting
point and a fixed
periodic interval.
This interval,
called
the sampling inte
rval, is calculated
by dividing the
population size by
the
desired samplesiz
e.
Non-Rando If not Random or 1. Convenience sampling-
m or Probability Sample, Convenience sampling is
Non-Probabi then it is a a non-probability
lity Sample sampling technique where
subjects are selected
because of their
convenient accessibility
and proximity to the
researcher
11
It is a
non-probability sa
mpling technique
where the
researcher
selects units to be
sampled based on
their knowledge
and
professional judg
ment. This type
of sampling techni
que is also known
as
purposive samplin
g and
authoritative samp
ling.
Sample Size
(A larger sample can yield more accurate results — but you need to spend for large responses can
be pricey)
Margin of Error No sample will be perfect, so you need to
(Confidence Interval) decide how much error to allow. The
confidence interval determines how much
higher or lower than the population mean
you are willing to let your sample mean fall.
If you’ve ever seen a political poll on the
news, you’ve seen a confidence interval. It
will look something like this: “68% of voters
said yes to Proposition Z, with a margin of
error of +/- 5%.”
Confidence Level How confident do you want to be that the actual
mean falls within your confidence interval? The
most common confidence intervals are 90%
confident, 95% confident, and 99% confident.
Standard of Deviation How much variance do you expect in your
responses? Since we haven’t actually
administered our survey yet, the safe decision is
to use .5 – this is the most forgiving number and
ensures that your sample will be large enough
Your confidence level corresponds to a Z-score. This is a constant value needed for this
equation. Here are the z-scores for the most common confidence levels:
12
● 90% – Z Score = 1.645
1. Number of sample
n
Cm
1. Sample size for known population
Known population size N = 5708
Assuming that you use 95% confidence interval, the error level is 0.05. The minimum
sample size is calculated by using the Yamane method (margin of error = 0.05):
nYamane = N / (1 + Ne2)
The calculation follows:
= 5708 / (1 + 5708(0.0025)
= 5708 / 1 +14.25
= 5708 / 15.25
= 374.30
The required sample size is about 374.
2 Sample size for unknown population
Where the population is unknown, the sample size can be derived by computing the
minimum sample size required for accuracy in estimating proportions by considering the
standard normal deviation set at 95% confidence level (1.96), percentage picking a choice or
response (50% = 0.5) and the confidence interval (0.05 = ±5). The formula is:
For unknown population at the 95% confidence level and 5% of confidence interval,
sample size n = Z2.p.q/e 2 = (1.96)2.(.5)(.5)/ (0.052) = 384 which was made
400.
n = z 2 (p)(1-p)
--------------------
c2
Where:
z = standard normal deviation set at 95% confidence level
p = percentage picking a choice or response
c = confidence interval Necessary Sample Size = (Z 2 x pq/e 2 )
Where:
· e is the desired level of precision (i.e. the margin of error),
· p is the (estimated) proportion of the population which has the attribute in question,
· q is 1 – p.
The z-value is found in a Z table.
Here is how the math works assuming you chose a 95% confidence level, .5 standard
deviation, and a margin of error (confidence interval) of +/- 5%.
13
.9604 / .0025
384.16
385 respondents are needed
If you find your sample size is too large to handle, try slightly decreasing your confidence
level or increasing your margin of error – this will increase the chance for error in your
sampling, but it can greatly decrease the number of responses you need.
1. In order to estimate the sample size, three issues need to be studied, i.e. the level of
precisions, confidence or risk level and the variability. Regarding the last issue, which your
questions is concentrated the degree of variability in the attributes being measured refers to
the distribution of attributes in the population.
2.The more heterogeneous a population, the larger the sample size required to obtain a given
level of precision. The less variable (more homogeneous) a population, the smaller the
sample size.
Note that a proportion of 50% indicates a greater level of variability than either 20% or 80%.
This is because 20% and 80% indicate that a large majority do not or do, respectively, have
the attribute of interest. Because a proportion of .5 indicates the maximum variability in a
population, it is often used in determining a more conservative sample size, that is, the
sample size may be larger than if the true variability of the population attribute were used.
14
Median Ordinal, Divide the Arrang Best measure of
interval or distribution into two ement central tendency
Ratio level, equal parts of data for skewed and
Skewed, open in a open ended data
or incomplete seque
data nce
Mode Nominal, Ordinal, Most frequently More than one Used in marketing
Interval or Ratio occurring data value; coincide research,
with Mean and Preference type
Median in case
of normal
distribution.
Geometric Ratio Level Average rate of Always less Growth rate of
Mean change than AM Gross Domestic
Product, Useful in
banking
Harmonic Ratio Level - HM is always = Averaging the
Mean or > Geometric speed and other
mean ratio
Quartile Ordinal, Interval Divides the Relatively Useful for skewed
Q1 1st and Ratio level distribution into ten insensitive, data set when
Quartile equal parts each requires sensitive measures
Q2 median deciles contains 25% orderings of are not meaningful.
or 2nd of data values. data
quartile
Q3rd
quartile
Deciles Ordinal, Interval Divides the Relatively Suitable to define
or Ratio Level distribution into ten insensitive, the location of
equal parts each requires data values within
decile contains 10% of orderings of distribution
data values. data
Percentiles Ordinal, Interval Divides the Relatively Useful to
or Ratio Level distribution into 100 insensitive, provide
equal parts requires fine
orderings of descriptio
data n of data
set
1. Levene's Test for Homogeneity of Variances is included in the output when you
run a one-way ANOVA in SPSS Statistics
15
Topic4: Measures of Dispersion
16
means curve is either
platykurtic and concentration of
zero value means various of values
curve is normal around mean and
values in tails of
the distribution.
Topic-4 Probability
Discrete Probability A discrete distribution describes
the probability of occurrence of
each value of a discrete random
variable. A discrete random
variable is a random variable that
has countable values, such as a list
of non-negative integers.
Binomial Distribution Poison Distribution Normal Distribution
17
association between crosstabulated
variables.
18
Wilcoxon samples are normality, Mann-Whitney test
Test from same is better choice.
population
http://www.graphpad.com/guides/prism/6/statistics/index.htm?stat_qa_choosing_a_test
_to_compare_.htm
If I have data from three or more groups, is it OK to compare two groups at
a time with a t test?
No. You should analyze all the groups at once with one-way ANOVA, and then
follow up with multiple comparison tests. The only exception is when some of the
'groups' are really controls to prove the assay worked, and are not really part of
the experimental question you are asking.
I know the mean, SD (or SEM) and sample size for each group. Which tests
can I run?
You can enter data as mean, SD (or SEM) and N, and Prism can compute an
unpaired t test. Prism cannot perform an paired test, as that requires analyzing
each pair. It also cannot do any nonparametric tests, as these require ranking the
data.
I only know the two group means, and don't have the raw data and don't
know their SD or SEM. Can I run a t test?
No. The t test compares the difference between two means and compares that
difference to the standard error of the difference, computed from the standard
deviations and sample size. If you only know the two means, there is no possible
way to do any statistical comparison.
19
Can I use a normality test to make the choice of when to use a
nonparametric test?
It is not a good idea to base your decision solely on the normality test. Choosing
when to use a nonparametric test is not a straightforward decision, and you can't
really automate the process.
I want to compare two groups. The outcome has two possibilities, and I
know the fraction of each possible outcome in each group. How can I
compare the groups?
Not with a t test. Enter your data into a contingency table and analyze
with Fisher's exact test.
I want to compare the mean survival time in two groups. But some
subjects are still alive so I don't know how long they will live. How can I
do a t test on survival times?
You should use special methods designed to compare survival curves. Don't run a
t test on survival times.
I don't know whether it is ok to assume equal variances. Can't a statistical
test tell me whether or not to use the Welch t test?
While that sounds like a good idea, in fact it is not. The decision really should be
made as part of the experimental design and not based on inspecting the data.
I don't know whether it is better to use the regular paired t test or the ratio
test. Is it ok to run both, and report the results with the smallest P value?
No. The results of any statistical test can only be interpreted at face value when the
choice of analysis method was part of the experimental design.
References
1. J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 1988,
ISBN=978-0805802832
2. R. V. Lenth, R. V. (2001), "Some Practical Guidelines for Effective Sample Size
Determination,'' The American Statistician, 55, 187-193. A preliminary draft was posted as
a pdf file.
20