Sunteți pe pagina 1din 20

HU-351 Mathematical Economics & Econometrics

for 3 credit
Syllabus Hours (Total - 42)

Unit Contents Contact


1 Introduction Hrs
1.1 What is Econometrics? Why a separate discipline? How it is different 6
from Mathematical Economics, Type of Data, Sources of data
1.2 Estimating Economic Relationship, Methodology of Econometrics
1.3 Matrix and its Application in Economics
2 Review of Calculus 6
2.1 Differential Calculus and its application in Economics- Elasticity of demand- Price and
Cross; Profit maximization under Perfect Competition, Monopoly, Oligopoly and
Monopolistic Competition
2.2 Integral Calculus and its application in Economics - Capital Formation, Compound
Interest; Capital value and Flow Value; Consumer surplus under pure competition
and monopoly; Producers Surplus
2.3 Differential Equation and its application in Economics – Market Price Function;
Dynamic Multiplier;
3 Review of statistics 6
3.1 Basic Ingredients of an Empirical study- Formulating a Model; Gathering data
Descriptive Statistics and its use in Business- Measure of Central Tendency: AM,
GM and HM, Median, Mode, Dispersion, Range, Quartile, standard Deviation,
Skewness, Kurtosis,
3.2 Probability - Discrete and Continuous; Probability Distribution: Binomial and
Poison distribution, Normal distribution
Mid semester
3. Review of Statistics….. conti 6
3.3 Sampling techniques, Estimation and Hypothesis Testing, Interpreting the
results

4 Regression 6
Statistical verses Deterministic Relationships, Regression verses Causation; Two
variable Regression Analysis; Population Regression Function (PRG), Stochastic
specification of PRF; The Significance of the Stochastic Term; stochastic
disturbance Term; the sample regression Function (SRF); Method of Ordinary Least
Squares; Properties of Least Square Estimators: The Gauss-Markov Theorem,
Coefficient of determination r​2 ​: A Measure of “goodness of fit”; Monto Carlo
Experiments
5 Classical Normal Linear Regression Mode (CNLRM) 6
The Probability distribution of Disturbances (meu); Normality
Assumption, Method of Maximum Likelihood
Multiple regression Analysis: The Problem of estimation; The
problem of Inference, Cobb-Douglas Production function;
Polynomial Regression Model; Testing for structural or Parametric
stability of regression Models; the Chow test

1
6 Presentation on Application of Mathematics, Statistics, Operational 6 hrs.
Research Computer Science or any other related subject to discuss
any Aspect of Economics

HU- 351 ​Mathematical Economics & Econometrics


4 credit course
Syllabus Hours ( Total - 56)

Uni Contents Contact


t Hrs
1 Introduction
1.4 What is Econometrics? Why a separate discipline? How it is different 6
from Mathematical Economics, Type of Data, Sources of data
1.5 Estimating Economic Relationship, Methodology of Econometrics
1.6 Matrix and its Economic Application
2 Review of Calculus 12
2.4 Differential Calculus and its application in Economics- Elasticity of demand- Price and
Cross; Profit maximization under Perfect Competition, Monopoly, Oligopoly and
Monopolistic Competition
2.5 Integral Calculus and its application in Economics - Capital Formation, Compound
Interest; Capital value and Flow Value; Consumer surplus under pure competition
and monopoly; Producers Surplus
2.6 Differential Equation and its application in Economics – Market Price Function;
Dynamic Multiplier;

3. Review of Statistics 14
3.4 Basic Ingredients of an Empirical study- Formulating a Model; Gathering data
Descriptive Statistics and its use in Business- Measure of Central Tendency: AM,
GM and HM, Median, Mode, Dispersion, Range, Quartile, standard Deviation,
Skewness, Kurtosis,
3.5 Probability - Discrete and Continuous; Probability Distribution: Binomial and
Poison distribution
3.6 Sampling techniques, Estimation and Hypothesis Testing, Interpreting the
results

Mid semester

4 Regression 8 Hours
.

Statistical verses Deterministic Relationships, Regression verses Causation; Two


variable Regression Analysis; Population Regression Function (PRG), Stochastic
specification of PRF; The Significance of the Stochastic Term; stochastic
disturbance Term; the sample regression Function (SRF); Method of Ordinary
Least Squares; Properties of Least Square Estimators: The Gauss-Markov

2
Theorem, Coefficient of determination r​2 ​: A Measure of “goodness of fit”;
Monto Carlo Experiments
5 Classical Normal Linear Regression Mode 4 Hours
. (CNLRM)

The Probability distribution of Disturbances (meu); Normality Assumption,


Method of Maximum Likelihood
Multiple regression Analysis: The Problem of estimation; The problem of
Inference
Cobb-Douglas Production function; Polynomial Regression Model; Testing for
structural or Parametric stability of regression Models; the Chow test
6 Dummy Variable (DV) 6 Hours
.

Nature; ANOVA models; Regression with a mixture of Quantitative and


Qualitative regressors: The ANCOVA Models; DV alternative to the Chow Test;
Interaction effects using Dummy Variable; Use of DV in seasonal Analysis
7 Presentation on Application of Mathematics, 6 hrs.
. Statistics, Operational Research , Computer
Science or any other related subject to discuss
any Aspect of Economics

Suggested books

Sl.No. Name of Books, Authors, Publishers Year of


Publication/Reprin
t
1. Wooldridge Jeffrey , Introductory Econometrics, Cencage 2014
Learning- ISBN-13-978-81-315-1673-7;
ISBN-1081-315-1673-3
2. Damodar N. Gujrati, Basic Econometrics, Net version is
Mcgraw Hill Education (India) Limited, Fifth available
Edition ISBN-978-0-07-133345-0; ISBN;
0-07-133345-2
3. Ramu Ramanathan, Introductory Econometrics with Latest
Applications, Harcourt Brace Jovanovich Publishers, USA
ISBN-

Index
S.N. Subject Page no
1 First Note 3

3
Introduction
In the recent times, Mathematical Economics and Econometrics have attracted attention of
engineering students. Two of the important reasons are need to understand of economic forces
and develop models at Micro level and secondly, understanding economic environment at
Macro level. Some of them like to explore other disciplines where they can use their basic
analytical reasoning and vast knowledge of mathematics, statistics and other similar subjects.

1. Mathematical Economics

Mathematical economics is not any branch of Economics as Monetary Economics and International
Economics but just application of Mathematics to the purely theoretical aspects of economic
analysis. It mainly expresses economic theory in mathematical form. It does not deal with
measurement.

2. Econometrics

As the word ‘Metric’ refers to measurement, Econometrics in simple words means measurement of
economicdate to verify an Economic Theory.It deals with the study of empirical observations for
estimation, inference and forecasting.

Econometrics may be defined as the quantitative analysis of actual economic phenomena based on
the concurrent development of theory and observation, related by appropriate methods of
inference (P.A. Samuelson T.C. Koopmans and J.R.N. Stone, “Report of the Evaluation Committee for
Econometrica,”Econometrica, 22(2)1954 pp. 141-146).

Econometrics may be defined as the social science in which the tools of economic theory,
mathematics, and statistical inferences are applied to the analysis of economic phenomena ( Arthur
S. Golberger, Econometric Theory, John Wiley & Sons, New York, 1964, p.1).

As empirical studies and theoretical analysis are often complementary and mutual reinforcing,
knowledge of both Mathematical Economics and Econometrics are important but knowledge of
Mathematical Economics may be considered as the more basic of the two to undertake
Econometrics study. Hence, knowledge of Mathematical Economics is useful not only for those who
are interested in theoretical Economics, but also for those seeking a foundation for the pursuit of
Econometric studies.

In the beginning, statistics was used for Econometric studies. However, over the years other similar
branches as operational research game theory, fuzzy logic etc. are being used for Mathematical and
Econometrics studies.

3. Deterministic and stochastic Relationship

In Mathematics and Physics, a deterministic system has no randomness, is involved in the


development of future state of system. A deterministic model will produce the same output. In
economics, a production function is a deterministic model. Or when you calculate price elasticity of
demand that is deterministic relationship. The Ramsay-Cass-Koopmans model is deterministic
model.

4
But in stochastic relationship, Randomness is involved.

Significance of Stochastic Disturbance Term

The disturbance term, ‘µ’ represents all those variables that are omitted from the model
but that collectively affect, ‘Y’.

​ hen other variables remain same​)


Assumption of ‘​Ceritus Paribus’ (w

As we have studied that D inversely related to Price of the product when other things as
taste, fashion, price of the substitute goods, complimentary goods remain same. But due
to paucity of time or data, we may not study that all other variables to remain same or
not. So, we cannot say that

4. Ingredients of an Econometric Model: ​(a question for five marks


can come diagram has to be added)

Broadly, traditional econometric methodology proceeds along with the following lines:

i. Statement of theory
It is very important to discuss what the exiting information/ theory says.

As Keynes stated:

The fundamental psychological law….is that men (women) are disposed, as a rule and on
average, to increase their consumption as their income increases, but not as much as the
increase in their income (John Maynard Keynes, The general Theory of Employment,
Interst and Money, Harcourt Brace Jovanovich, New York, 1936, p. 36).

In short, Keynes postulated that the marginal propensity to consume (MPC), the rate of
change of consumption for a unit change in come is greater than zero but less than one.

• John Maynard Keynes


• 5 June 1883 – 21 April 1946 (aged 62)
• Nationality - ​British
• Field- political economy, probability
• School of thought – Keynesian economics
• Contributions – MACROECONOMICS
• Liquidity preference theory
• Spending multiplier
• AD–AS model
• Deficit spending
• The general Theory of Employment, Interest and Money, 1936

ii. Specification of the Mathematical Model of the theory

Though Keynes did not discuss its Mathematical form, but that canbe discussed as a

Y= α +βX ; 0<β<1

Where

5
Y = consumption expenditure and X = income

α and β are parameters of the model; the intercept and slope coefficients.

The slope coefficient β measures the MPC. The equation is called consumption function in
Economics and it states that consumption is linearly related to income.

iii. Specification of the statistical, or econometric model

The purely mathematical model of consumption Function is of limited e to the


Econometricians because it is an exact or deterministic relationship between
consumption and income. But relationships between economic variables are generally
inexact. To allow for the inexact relationships between economic variables, the
econometrician would modify the deterministic consumption function

Y= α + βX + µ

Where µ is known as disturbance, or error, the term is a stochastic variable that has well
defined probabilistic properties. The disturbance term µ may well represented all those
factors that affect consumption but are not taken into account explicitly.

iv. Obtaining the data

To estimate the Econometric model, numerical value of α and β need to be obtained.


For which we need to collect data

v. Estimation of the Parameters of the Econometric model

As we have data, the next task would to estimate the parameters of the consumption
function. The numerical estimates of the parameters give empirical content to the
consumption function. Through statistical technique of regression analysis is used to
estimate.

V (hat)= 103736.0493 + 0.6303 X​1

The hat on the Y indicates that it is an estimate. Value of β is .63, suggest that for the
given data during the sample period, an increase in real income of one rupee led, led
on average, to an increase of about 63 paisa in real consumption expenditure. We say
on average because the relationship between consumption and income is inexact.

vi. Hypothesis testing

Assuming that the fitted model is reasonably good approximation of reality, we have to
develop theory or hypothesis that is not verifiable by appeal to empirical evidence
may not be admissible as part of scientific enquiry (Milton Friedman, ‘The
Methodology of Positive Economics’, Essay in Positive Economics, University of Chicago
Press, Chicago, 1953).

6
vii. Forecasting or Prediction

If the chosen model does not refute the hypothesis or theory under consideration, we may
use it to predict the future value(s) of the dependent or forecast, Y on the basis of the
known or expected future value(s)of the explanatory, or predictor, variable X.

viii. Usingthe model for Control or Policy purposes.

Suppose we have estimated consumption function. Suppose further that the government
believes that consumer expenditure of about 2500 thousand crore rupees (in
1999-2000 prices) will increase the required employment level in the country. Through
MPC calculated, the government will be able to find out that income need to be
increased by what rate.

Choosing among Competing Model


Advice given by Clive Granger is worth keeping in mind:

i. What purpose does it have? What economic decision does it help with?
ii. Is there any evidence being presented that allows me to evaluate its quality compared to
alternative theories or models?

Following are two post-Keynesian models of Income. So, they can be


said as an improvement over Keynesian model.
i. PERMANENT INCOME HYPOTHESIS OF MILTON FRIEDMAN

The permanent income hypothesis was formulated by the Nobel Prize winning economist
Milton Friedman​ in 1957. The hypothesis implies that changes in consumption behavior are
not predictable, because they are based on individual expectations. This has broad
implications concerning economic policy. Under this theory, even if economic policies are
successful in increasing income in the economy, the policies may not kick off a ​multiplier
effect​ from increased consumer spending. Rather, the theory predicts there will not be an
uptick​ in consumer spending until workers reform expectations about their future incomes.
(source: ​http://www.investopedia.com/terms/p/permanent-income-hypothesis.asp​ dated
22/2/2016)
Milton Friedman was an American economist who received the 1976 Nobel Memorial Prize
in Economic Sciences for his research on consumption analysis, monetary history and
theory and the complexity of stabilization policy. ​Wikipedia
Born​: July 31, 1912, ​Brooklyn, New York City, New York, United States
Died​: November 16, 2006, ​San Francisco, California, United States
Influenced by​: ​John Maynard Keynes​, F​ riedrich Hayek​, ​more
Influenced​: ​Margaret Thatcher​, G
​ ary Becker​, ​Mart Laar​, ​more

7
ii. LIFE CYCLE PERMANENT INCOME HYPOTHESIS of ROBERT HALL

(Source_ Hall Robert E. (1978), “Stochastic Implications of the Life Cycle – Permanent
Income Hypothesis: Theory and evidence” Journal of Political Economy, vol86, no. 6 pp
971-987)

Robert Ernest "Bob" Hall​ (born August 13, 1943) is an American ​economist​ and a Robert
and Carole McNeil Senior Fellow at ​Stanford University​'s ​Hoover Institution​. He is generally
considered a ​macroeconomist​, but he describes himself as an "applied economist".​[1]
Bob Hall received a ​BA​ in Economics at the ​University of California, Berkeley​ and a ​PhD​ in
Economics from ​MIT​ for thesis titled ​Essays on the Theory of Wealth​ under the supervision
of ​Robert Solow​. He is a member of the ​Hoover Institution​, the ​National Academy of
Sciences​, a fellow at both ​American Academy of Arts and Sciences​ and the ​Econometric
Society​, and a member of the ​NBER​, where he is the program director of the business cycle
dating committee. Hall served as President of the ​American Economic Association​ in 2010.

Types of data/ Variables

Discrete or Continuous data


i. Categorical variable are also called as discrete variable
ii. Continuous variable are also called as quantitative variable

Discrete Data Continuous Data


It can only take on value in a range. It can take any value in a range
It is counted It is measured
Example- throwing of a dice or tossing a coin Example – Distance, time etc. Each
fraction can be explained

Discrete data/ variable can be further categorized as either ​nominal ​or ​ordinal:

● Nominal variables are variables that have two or more categories, but which do not
have an intrinsic order. For example, a real estate agent could classify their types of
property into distinct categories such as houses, condos, co-ops or bungalows. So
"type of property" is a nominal variable with 4 categories called houses, condos,

8
co-ops and bungalows. Of note, the different categories of a nominal variable can also
be referred to as groups or levels of the nominal variable. Another example of a
nominal variable would be classifying where people live in the USA by state. In this
case there will be many more levels of the nominal variable (50 in fact). Dichotomous
is a extreme case of nominal data which have only two categories or levels. For
example, if we were looking at gender, we would most probably categorize somebody
as either "male" or "female". This is an example of a dichotomous variable (and also a
nominal variable). Another example might be if we asked a person if they owned a
mobile phone. Here, we may categorise mobile phone ownership as either "Yes" or
"No". In the real estate agent example, if type of property had been classified as either
residential or commercial then "type of property" would be a dichotomous variable.
● Ordinal​ variables are variables that have two or more categories just like nominal
variables only the categories can also be ordered or ranked. So if you asked someone
if they liked the policies of the Democratic Party and they could answer either "Not
very much", "They are OK" or "Yes, a lot" then you have an ordinal variable. Why?
Because you have 3 categories, namely "Not very much", "They are OK" and "Yes, a
lot" and you can rank them from the most positive (Yes, a lot), to the middle response
(They are OK), to the least positive (Not very much). However, whilst we can rank
the levels, we cannot place a "value" to them; we cannot say that "They are OK" is
twice as positive as "Not very much" for example.

Continuous data/ Variable can be further categorized as either ​interval​ or ​ratio


variables.

● Interval variables are variables for which their central characteristic is that they can
be measured along a continuum and they have a numerical value (for example,
temperature measured in degrees Celsius or Fahrenheit). So the difference between
20C and 30C is the same as 30C to 40C. However, temperature measured in degrees
Celsius or Fahrenheit is NOT a ​ratio​ variable.
● Ratio variables are interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature measured in
degrees Celsius or Fahrenheit is not a ratio variable because 0C does not mean there is
no temperature. However, temperature measured in Kelvin is a ratio variable as 0
Kelvin (often called absolute zero) indicates that there is no temperature whatsoever.
Other examples of ratio variables include height, mass, distance and many more. The
name "ratio" reflects the fact that you can use the ratio of measurements. So, for
example, a distance of ten metres is twice the distance of 5 metres.

Nomina
Can be computed Ordinal Interval Ratio
l
Frequency distribution. Yes Yes Yes Yes
Median and
No Yes Yes Yes
percentiles.
Add or subtract. No No Yes Yes

9
Mean, standard
deviation, standard No No Yes Yes
error of the mean.
Ratio, or coefficient of
No No No Yes
variation.

Data series –

Secondary vs. primary data

● If someone else has collected, it is called secondary data. Must give source of the
data.
● If data is not available than collect data which is called as primary data Primary data
can be collected as any of the following form:
● Cross Sectional Data: In medical or social research information is collected from a
population or sample.
● Time Series Data: It is sequence of data points, typically consisting of successive
measurements made over a time interval.
● Pooled Data: Both cross-sectional and time series but sample is different
● Panel data: With same sample a cross sectional as well as time series data

Topic-2: Sampling

Census When you enumerate all the you can draw sample from
units- it is called Census- which the census
is time taking and needs a lot of
money.
Sample - The data sample may it is a ​subset​ of a ​population​;
be drawn from a
population without
replacement,
or with replacement, it is a multi-subset
each individual Type of sampling
member of the 1. Simple random sample-
Random or population has a A ​simple random
Probability Sample known, non-zero sample​ is a subset of a
chance of being statistical population in
selected as part of the which each member of the
sample subset has an equal
probability of being
chosen. An ​example​ of
a​simple random
sample​ would be the
names of 25 employees
being chosen out of a hat
from a company of 250
employees.

10
2. 3. Stratified random 4. Cluster Random sampling-
Systematic sampling- With ​cluster sampling​, the
random A ​stratified random researcher divides the population
sample- sample​ is a ​random into separate groups,
Systematic sample​ in which members of called ​clusters​. Then, a simple
sampling​ is a the population are first divided random ​sample​ of c ​ lusters​ is
type of into strata, then selected from the population.
probability​sampli are ​randomly​ selected to be
ng​ method in a part of the​sample​.
which ​sample​ me
mbers from a
larger population
are selected
according to
a ​random ​starting
point and a fixed
periodic interval.
This interval,
called
the ​sampling​ inte
rval, is calculated
by dividing the
population size by
the
desired ​sample​siz
e.
Non-Rando If not ​Random or 1. Convenience sampling-
m or Probability Sample, Convenience sampling​ is
Non-Probabi then it is a a non-probability
lity Sample sampling​ technique where
subjects are selected
because of their
convenient accessibility
and proximity to the
researcher

2. 3. quota sampling 4. ., snowball sampling


judgement Quota sampling​ is a In sociology and statistics
sampling non-probability ​sampling​tec research, ​snowball
hnique wherein the sampling​ (or
assembled ​sample​ has the chain ​sampling​,
same proportions of chain-referral ​sampling​,
individuals as the entire referral ​sampling​) is a
population with respect to nonprobability ​sampling​ te
chnique where existing
known characteristics, traits
study subjects recruit
or focused phenomenon.
future subjects from among
their acquaintances.

11
It is a
non-probability ​sa
mpling​ technique
where the
researcher
selects units to be
sampled based on
their knowledge
and
professional ​judg
ment​. This type
of ​sampling​ techni
que is also known
as
purposive ​samplin
g​ and
authoritative ​samp
ling​.
Sample Size 

(​A larger sample can yield more accurate results — but you need to spend for large responses can 
be pricey)

 
Margin of Error No sample will be perfect, so you need to
(Confidence Interval) decide how much error to allow. The
confidence interval determines how much
higher or lower than the population mean
you are willing to let your sample mean fall.
If you’ve ever seen a political poll on the
news, you’ve seen a confidence interval. It
will look something like this: “68% of voters
said yes to Proposition Z, with a margin of
error of +/- 5%.”
Confidence Level How confident do you want to be that the actual
mean falls within your confidence interval? The
most common confidence intervals are 90%
confident, 95% confident, and 99% confident.
Standard of Deviation How much variance do you expect in your
responses? Since we haven’t actually
administered our survey yet, the safe decision is
to use .5 – this is the most forgiving number and
ensures that your sample will be large enough
Your confidence level corresponds to a Z-score. This is a constant value needed for this
equation. Here are the z-scores for the most common confidence levels:

12
● 90% – Z Score = 1.645

● 95% – Z Score = 1.96

● 99% – Z Score = 2.576


If you choose a different confidence level, use this ​Z-score table​* to find your score.

1. Number of sample
n​
C​m
1. Sample size for known population
Known population size N = 5708
Assuming that you use 95% confidence interval, the error level is 0.05. The minimum
sample size is calculated by using the Yamane method (margin of error = 0.05):
n​Yamane​ = N / (1 + Ne​2​)
The calculation follows:
= 5708 / (1 + 5708(0.0025)
= 5708 / 1 +14.25
= 5708 / 15.25
= 374.30
The required sample size is about 374.
2 Sample size for unknown population
Where the population is unknown, the sample size can be derived by computing the
minimum sample size required for accuracy in estimating proportions by considering the
standard normal deviation set at 95% confidence level (1.96), percentage picking a choice or
response (50% = 0.5) and the confidence interval (0.05 = ±5). The formula is:
For unknown population at the 95% confidence level and 5% of confidence interval,
sample size n = Z​2​.p.q/e 2​ = (1.96)​2​.(.5)(.5)/ (0.05​2​) = 384 which was made
400.

n = z 2​​ (p)(1-p)
--------------------
c2
Where:
z = standard normal deviation set at 95% confidence level
p = percentage picking a choice or response
c = confidence interval​ Necessary Sample Size = ​(Z 2​​ x pq/e 2​​ )
Where:
· e is the desired level of precision (i.e. the margin of error),
· p is the (estimated) proportion of the population which has the attribute in question,
· q is 1 – p.
The z-value is found in a Z table.

Here is how the math works assuming you chose a 95% confidence level, .5 standard
deviation, and a margin of error (confidence interval) of +/- 5%.

((1.96)​2​ x .5(.5)) / (.05)​2


(3.8416 x .25) / .0025

13
.9604 / .0025
384.16
385 respondents are needed
If you find your sample size is too large to handle, try slightly decreasing your confidence
level or increasing your margin of error – this will increase the chance for error in your
sampling, but it can greatly decrease the number of responses you need.

1. In order to estimate the sample size, three issues need to be studied, i.e. the level of
precisions, confidence or risk level and the variability. Regarding the last issue, which your
questions is concentrated the degree of variability in the attributes being measured refers to
the distribution of attributes in the population.
2.The more heterogeneous a population, the larger the sample size required to obtain a given
level of precision. The less variable (more homogeneous) a population, the smaller the
sample size.
Note that a proportion of 50% indicates a greater level of variability than either 20% or 80%.
This is because 20% and 80% indicate that a large majority do not or do, respectively, have
the attribute of interest. Because a proportion of .5 indicates the maximum variability in a
population, it is often used in determining a more conservative sample size, that is, the
sample size may be larger than if the true variability of the population attribute were used.

Cronbach's alpha​ is a >9 –Excellent


measure of internal .8 < .9 = good
consistency, that is, .7< .8 =
how closely related a acceptable
set of items are as a .6<alpha<.7 =
group. It is considered Questionnable
to be a measure of Below .6= poor
scale reliability. A "high"
value for ​alpha​ does
not imply that the
measure is
unidimensional.

Topic3: Measures of Central Tendency

Statistical Type Interpretatio Limitat Remarks


Tool of ns ions
Data
Arithmeti Continuous, Sum of deviation Distorts Best singles
c Mean Interval or from Mean is zero; Central estimate of any
Ratio sum of square of Tendency in parametric
deviations from the skewed distribution
mean is minimum. Distribution,
open ended
distribution

14
Median Ordinal, Divide the Arrang Best measure of
interval or distribution into two ement central tendency
Ratio level, equal parts of data for skewed and
Skewed, open in a open ended data
or incomplete seque
data nce
Mode Nominal, Ordinal, Most frequently More than one Used in marketing
Interval or Ratio occurring data value; coincide research,
with Mean and Preference type
Median in case
of normal
distribution.
Geometric Ratio Level Average rate of Always less Growth rate of
Mean change than AM Gross Domestic
Product, Useful in
banking
Harmonic Ratio Level - HM is always = Averaging the
Mean or > Geometric speed and other
mean ratio
Quartile Ordinal, Interval Divides the Relatively Useful for skewed
Q1 1st and Ratio level distribution into ten insensitive, data set when
Quartile equal parts each requires sensitive measures
Q2 median deciles contains 25% orderings of are not meaningful.
or 2​nd of data values. data
quartile
Q3rd
quartile
Deciles Ordinal, Interval Divides the Relatively Suitable to define
or Ratio Level distribution into ten insensitive, the location of
equal parts each requires data values within
decile contains 10% of orderings of distribution
data values. data
Percentiles Ordinal, Interval Divides the Relatively Useful to
or Ratio Level distribution into 100 insensitive, provide
equal parts requires fine
orderings of descriptio
data n of data
set

1. Levene's Test for Homogeneity of Variances is included in the output when you
run a one-way ANOVA in SPSS Statistics

2. Welch F test- Variance analysis

15
Topic4: Measures of Dispersion

Statistical Type of Data Interpretations Limitations Remarks


Tool
Range Difference between Indicate the Not useful in Rough
highest and lowest possible limit of case of comparison of
value in the data values in extreme spread of
distribution the distribution values characteristic
Quartile Defines 50% range Distorted by Best measures of
Deviation Ordinal, Interval or of data values extreme central tendency
Ratio level, Skewed, between first and values for skewed, open
open or incomplete third quartiles ended data
data
Standard Interval or Most used Requires Useful with mean
deviation & Ratio measure of higher level of to define the
Variance dispersion, larger measured various features
value indicates data of data
large dispersion distribution.
from mean. The Useful to
standard deviation combine standard
is minimum from deviation of two
the mean. or more
population.
Coefficient Interval or Useful to Requires data at higher Convenient for
of Ratio compare two or level to compute mean comparing the
Variation more and standard deviation characteristics of
distribution using many distribution
mean and
standard
deviation
Skewness Interval or Measures the Not available The difference
Ratio symmetry of a for Nominal between mean,
distribution, data and median or mode
slewness zero weak for can be used to
means perfectly ordinal data. measure skewness.
symmetrical.
Positive value
means tail to
right, negative
value means tail
to left and zero
value means is
normal. means
tail to left and
Kurtosis Interval or Ratio Indicates Useless for nominal Useful for
“peakedness” of a data, and very good for measuring
distribution. ordinal data dispersion with
Positive value mean and
means curve curve standard
isleprokurtic, deviation.
negative value Depends on

16
means curve is either
platykurtic and concentration of
zero value means various of values
curve is normal around mean and
values in tails of
the distribution.

Topic-4 Probability 
Discrete Probability A ​discrete distribution​ describes
the ​probability​ of occurrence of
each value of a ​discrete​ random
variable. A ​discrete​ random
variable is a random variable that
has countable values, such as a list
of non-negative integers.

Continuous Probability If a variable can take on any Example- time is infinite:


value ​between​ two specified you could count from 0
values, it is called a seconds to a billion
continuous​ variable; otherwise, it seconds…a trillion
is called a ​discrete​ variable. seconds…and so on,
Some ​examples​ will clarify forever.
the ​difference between
discrete​ and c ​ ontinuous​ variables.
... Therefore, the number of heads
must be a ​discrete​ variable.

 
Binomial Distribution Poison Distribution Normal Distribution

Topic-5 Tests of Hypothesis 

Test  Purpose  Type of Data  Interpretation/ Remark 


Chi-square To test Nominal and Useful to test whether variables
independence of ordinal differe significantly or not, also
variable useful to dteremine the extent of

17
association between crosstabulated
variables.

Proportion test To test Nominal Useful for 2x2 table dichotomous


difference in data for large sample size, based on
two proportions 2x2 table normal approximation

t-test  To test  Interval,  Useful for small sample, 


difference in  Nominal  assumes that sample drawn 
two sample  from normal population 
means 

Analysis of  To test  Interval,  Extension of t-test, compares 


Variance  difference in  Categorical  means of more than two 
two or more  groups, Normality 
sample  assumptions are assumed 
means  for each group 

Regression  To test  Interval,  Describes the extent, 


Analysis  significance  Ratio  direction and strength of 
of regression  relationship between several 
coefficient  variables and a dependent 
variable. 

Sign Test  To test the  Ordinal  Equivalent to t-test, do not 


distribution  requires assumption of 
of two  normality. 
related 
samples are 
same. 
Wilcoxon  To test the  Ordinal  More powerful test than sign 
signed-rank  distribution  test 
Test  of two 
related 
samples 
taking into 
account the 
magnitude of 
the 
differences 
Mann-Whitn To test the  Ordinal  Analogous to t-test, less 
ey U Test or  two  powerful than t-test. In case 
independent  distribution depart from 

18
Wilcoxon  samples are  normality, Mann-Whitney test 
Test  from same  is better choice. 
population 

Difference between one-tailed and two-tailed test


Factor of One-tailed Test Two-tailed Test
Comparison
Meaning A statistical Hypothesis test in which A significant test in which
alternative hypothesis has only alternative hypothesis has two
one-end is known as one-tailed test end, is called two-tailed test.
Hypothesis It has Direction as more It does not have direction.
than or less than
Region of Rejection Either left or right Both left and right
Determines If there is a relationship between If there is a relationship between
variables in single direction variables in either direction
Result Greater or less than certain Greater or less than certain
value range of values
Sign in alternative < or > ≠
hypothesis

http://www.graphpad.com/guides/prism/6/statistics/index.htm?stat_qa_choosing_a_test
_to_compare_.htm
If I have data from three or more groups, is it OK to compare two groups at
a time with a t test?
No. You should analyze all the groups at once with ​one-way ANOVA​, and then 
follow up with ​multiple comparison  tests​. The only exception is when some of the 
'groups' are really controls to prove the assay worked, and are not really part of 
the experimental question you are asking. 
I know the mean, SD (or SEM) and sample size for each group. Which tests
can I run?
You can ​enter data​ as mean, SD (or SEM) and N, and Prism can compute an 
unpaired t test. Prism cannot perform an paired test, as that requires analyzing 
each pair. It also cannot do any nonparametric tests, as these require ranking the 
data. 
I only know the two group means, and don't have the raw data and don't
know their SD or SEM. Can I run a t test?
No. The t test compares the difference between two means and compares that 
difference to the standard error of the difference, computed from the standard 
deviations and sample size. If you only know the two means, there is no possible 
way to do any statistical comparison. 

19
Can I use a normality test to make the choice of when to use a
nonparametric test?
It is ​not a good idea​ to base your decision solely on the normality test. Choosing 
when to use a nonparametric test is not a straightforward decision, and you can't 
really automate the process. 
I want to compare two groups. The outcome has two possibilities, and I
know the fraction of each possible outcome in each group. How can I
compare the groups?
Not with a t test. Enter your data into a ​contingency table​ and analyze 
with ​Fisher's​ exact test. 
I want to compare the mean survival time in two groups. But some
subjects are still alive so I don't know how long they will live. How can I
do a t test on survival times?
You should use special methods designed to ​compare survival curves​. Don't run a 
t test on survival times. 
I don't know whether it is ok to assume equal variances. Can't a statistical
test tell me whether or not to use the Welch t test?
While that sounds like a good idea, ​in fact it is not​. The decision really should be 
made as part of the experimental design and not based on inspecting the data. 
I don't know whether it is better to use the regular paired t test or the ratio
test. Is it ok to run both, and report the results with the smallest P value?
No. The results of any statistical test can only be interpreted at face value when the 
choice of analysis method was part of the experimental design. 
References
1.​ J. Cohen, ​Statistical Power Analysis for the Behavioral Sciences,​ 1988, 
ISBN=978-0805802832 
2.​ R. V. Lenth, R. V. (2001), "Some Practical Guidelines for Effective Sample Size 
Determination,'' The American Statistician, 55, 187-193.  A preliminary draft was​ posted as 
a pdf file​. 

20

S-ar putea să vă placă și