Sunteți pe pagina 1din 26

Review questions 1 and2

1. (a) Why does OLS estimation involve taking vertical deviations of the points to
the line rather than horizontal distances?

(b) Why are the vertical distances squared before being added together?

(c) Why are the squares of the vertical distances taken rather than the absolute
values?

2. Explain, with the use of equations, the difference between the sample regression
function and the population regression function.

3. What is an estimator? Is the OLS estimator superior to all other estimators? Why
or why not? 4. What five assumptions are usually made about the unobservable
error terms in the classical linear regression model (CLRM)? Briefly explain the
meaning of each. Why are these assumptions made?

6. The capital asset pricing model (CAPM) can be written as

E(Ri)= Rf +βi[E(Rm)− Rf ]

using the standard notation.

The first step in using the CAPM is to estimate the stock’s beta using the market
model.

The market model can be written as

Rit = αi +βi Rmt +uit

where Rit is the excess return for security i at time t, Rmt is the excess return on a
proxy for the market portfolio at time t, and ut is an iid random disturbance term.
The coefficient beta in this case is also the CAPM beta for security i.

Suppose that you had estimated and found that the estimated value of beta for a
stock, ˆ β was 1.147. The standard error associated with this coefficient SE( ˆ β) is
estimated to be 0.0548.

A city analyst has told you that this security closely follows the market, but that it
is no more risky, on average, than the market. This can be tested by the null
hypotheses that the value of beta is one. The model is estimated over 62 daily
observations. Test this hypothesis against a one-sided alternative that the security
is more risky than the market, at the 5% level. Write down the null and alternative
hypothesis. What do you conclude? Are the analyst’s claims empirically verified?

7. The analyst also tells you that shares in Chris Mining PLC have no systematic
risk, in other words that the returns on its shares are completely unrelated to
movements in the market. The value of beta and its standard error are calculated to
be 0.214 and 0.186, respectively. The model is estimated over 38 quarterly
observations. Write down the null and alternative hypotheses. Test this null
hypothesis against a two-sided alternative.

8. Form and interpret a 95% and a 99% confidence interval for beta using the
figures given in question 7.

9. Are hypotheses tested concerning the actual values of the coefficients (i.e. β) or
their estimated values (i.e. ˆ β) and why?

10. Using EViews, select one of the other stock series from the ‘capm.wk1’ file
and estimate a CAPM beta for that stock. Test the null hypothesis that the true beta
is one and also test the null hypothesis that the true alpha (intercept) is zero. What
are your conclusions?

Review questions3

1. By using examples from the relevant statistical tables, explain the


relationship between the t- and the F-distributions.
For questions 2–5, assume that the econometric model is of the form
yt = β1 +β2x2t +β3x3t +β4x4t +β5x5t +ut
2. Which of the following hypotheses about the coefficients can be tested using
a t-test?
Which of them can be tested using an F-test? In each case, state the number
of restrictions.

(a) H0 : β3 =2

(b) H0 : β3 +β4 =1

(c) H0 : β3 +β4 =1 and β5 =1


(d) H0 : β2 =0 and β3 =0 and β4 =0 and β5 =0

(e) H0 : β2β3 =1

3. Which of the above null hypotheses constitutes ‘THE’ regression F-statistic in


the context of? Why is this null hypothesis always of interest whatever the
regression relationship under study? What exactly would constitute the alternative
hypothesis in this case?

4. Which would you expect to be bigger – the unrestricted residual sum of squares
or the restricted residual sum of squares, and why?

5. You decide to investigate the relationship given in the null hypothesis of


question 2, part (c). What would constitute the restricted regression? The
regressions are carried out on a sample of 96 quarterly observations, and the
residual sums of squares for the restricted and unrestricted regressions are 102.87
and 91.41, respectively. Perform the test. What is your conclusion?

6. You estimate a regression of the form given by (3.52) below in order to evaluate
the effect of various firm-specific factors on the returns of a sample of firms. You
run a cross-sectional regression with 200 firms

ri = β0 +β1Si +β2MBi +β3PEi +β4BETAi +ui

where: ri is the percentage annual return for the stock

Si is the size of firm i measured in terms of sales revenue

MBi is the market to book ratio of the firm

PEi is the price/earnings (P/E) ratio of the firm

BETAi is the stock’s CAPM beta coefficient

You obtain the following results (with standard errors in parentheses)

ˆ ri = 0.080+ 0.801Si +0.321MBi +0.164PEi −0.084BETAi

(0.064) (0.147) (0.136) (0.420) (0.120)

Calculate the t-ratios. What do you conclude about the effect of each variable on
the returns of the security? On the basis of your results, what variables would you
consider deleting from the regression? If a stock’s beta increased from 1 to 1.2,
what would be the expected effect on the stock’s return? Is the sign on beta as you
would have expected? Explain your answers in each case.

7. A researcher estimates the following econometric models including a lagged


dependent variable

yt = β1 +β2x2t +β3x3t +β4yt−1 +ut

yt = γ1 +γ2x2t +γ3x3t +γ4yt−1 +vt

where ut and vt are iid disturbances.

Will these models have the same value of

(a) The residual sum of squares (RSS),

(b) R2,

(c) Adjusted R2? Explain your answers in each case.

8. A researcher estimates the following two econometric models

yt = β1 +β2x2t +β3x3t +ut

yt = β1 +β2x2t +β3x3t +β4x4t +vt

where ut and vt are iid disturbances and x3t is an irrelevant variable which does
not enter into the data generating process for yt. Will the value of

(a) R2,

(b) Adjusted R2, be higher for the second model than the first? Explain your
answers.

9. Re-open the CAPM E views file and estimate CAPM betas for each of the other
stocks in the file.

(a) Which of the stocks, on the basis of the parameter estimates you obtain, would
you class as defensive stocks and which as aggressive stocks? Explain your
answer.
(b) Is the CAPM able to provide any reasonable explanation of the overall
variability of the returns to each of the stocks over the sample period? Why or why
not?

10. Re-open the Macro file and apply the same APT-type model to some of the
other time-series of stock returns contained in the CAPM-file.

(a) Run the stepwise procedure in each case. Is the same sub-set of variables
selected for each stock? Can you rationalize the differences between the series
chosen?

(b) Examine the sizes and signs of the parameters in the regressions in each case –
do these make sense?

11. What are the units of R2?

Review questions

1. Are assumptions made concerning the unobservable error terms (Ɛí) or about
their sample counterparts, the estimated residuals (ˆ Ɛí))? Explain your answer.

2. What pattern(s) would one like to see in a residual plot and why?

3. A researcher estimates the following model for stock market returns, but thinks
that there may be a problem with it. By calculating the t-ratios, and considering
their significance and by examining the value of R2 or otherwise, suggest what the
problem might be.

ˆ yt = 0.638+0.402x2t −0.891x3t R2 = 0.96, ¯ R2 = 0.89

(0 .436) (0.291) (0.763)

How might you go about solving the perceived problem?

4. (a) State in algebraic notation and explain the assumption about the CLRM’s
disturbances that is referred to by the term ‘homoscedasticity’.

(b) What would the consequence be for a regression model if the errors were not
homoscedastic?

(c) How might you proceed if you found that(b) were actually the case?
5. (a) What do you understand by the term ‘autocorrelation’?

(b) An econometrician suspects that the residuals of her model might be auto
correlated. Explain the steps involved in testing this theory using the Durbin–
Watson (DW) test.

(c) The econometrician follows your guidance (!!!) in part (b) and calculates a
value for the Durbin–Watson statistic of 0.95. The regression has 60 quarterly
observations and three explanatory variables (plus a constant term). Perform the
test. What is your conclusion?

(d) In order to allow for autocorrelation, the econometrician decides to use a model
in first differences with a constant

Δyt = β1 +β2Δx2t +β3Δx3t +β4Δx4t +ut

By attempting to calculate the long-run solution to this model, explain what might
be a problem with estimating models entirely in first differences.

(e) The econometrician finally settles on a model with both first differences and
lagged levels terms of the variables

Δ yt = β1 +β2Δx2t +β3Δx3t +β4Δx4t +β5x2t−1 +β6x3t−1

+β7x4t−1 +vt

Can the Durbin–Watson test still validly be used in this case?

6. Calculate the long-run static equilibrium solution to the following dynamic


econometric model

Δ yt = β1 +β2x2t +β3

Δ x3t +β4yt−1 +β5x2t−1 +β6x3t−1 +β7x3t−4 +ut

7. What might Ramsey’s RESET test be used for? What could be done if it were
found that the RESET test has been failed?

8. (a) Why is it necessary to assume that the disturbances of a regression model are
normally distributed?
(b) In a practical econometric modeling situation, how might the problem that the
residuals are not normally distributed be addressed?

9. (a) Explain the term ‘parameter structural stability’?

(b) A financial econometrician thinks that the stock market crash of October 1987
fundamentally changed the risk–return relationship given by the CAPM equation.
He decides to test this hypothesis using a Chow test. The model is estimated using
monthly data from January 1980–December 1995, and then two separate
regressions are run for the sub-periods corresponding to data before and after the
crash.

The model is

rt = α+βRmt +ut (4.79)

so that the excess return on a security at time t is regressed upon the excess return
on a proxy for the market portfolio at time t. The results for the three models
estimated for shares in British Airways (BA) are as follows:

1981M1–1995M12

rt =0.0215+1.491rmt RSS=0.189 T =180 (4.80)

1981M1–1987M10

rt =0.0163+1.308rmt RSS=0.079 T =82 (4.81)

1987M11–1995M12

rt =0.0360+1.613rmt RSS=0.082 T =98 (4.82)

(c) What are the null and alternative hypotheses that are being tested here, in terms
of α and β? (d) Perform the test.

What is your conclusion

10. For the same model as above, and given the following results, do a forward and
backward predictive failure test:

1981M1–1995M12
rt =0.0215+1.491rmt RSS=0.189 T =180 (4.83)

1981M1–1994M12

rt =0.0212+1.478rmt RSS=0.148 T =168 (4.84)

1982M1–1995M12

rt =0.0217+1.523rmt RSS=0.182 T =168 (4.85)

What is your conclusion?

11. Why is it desirable to remove insignificant variables from a regression?

12. Explain why it is not possible to include an outlier dummy variable in a


regression model when you are conducting a Chow test for parameter stability.
Will the same problem arise if you were to conduct a predictive failure test? Why
or why not?

13. Re-open the ‘macro.wf1’ and apply the stepwise procedure including all of the
explanatory variables as listed above, i.e. erased dropped credit inflation money
spread term with a strict 5% threshold criterion for inclusion in the model. Then
examine the resulting model both financially and statistically by investigating the
signs, sizes and significances of the parameter estimates and by conducting all of
the diagnostic tests for model adequacy.
Key points unit 1 Introduction

■ Financial econometrics is the science of modeling and forecasting financial data.

■ The three steps in applying financial econometrics are model selection, model
estimation, and model testing. In model selection, the modeler chooses a family of
models with given statistical properties. Financial economic theory is used to
justify the model choice. The financial econometric tool used is determined in this
step.

■ Data mining is an approach to model selection based solely on the data and,
although useful, must be used with great care because the risk is that the model
selected might capture special characteristics of the sample which will not repeat in
the future.

■ In general, models are embodied in mathematical expressions that include a


number of parameters that have to be estimated from sample data. Model
estimation involves finding estimators and understanding the behavior of
estimators.

■ Model testing is needed because model selection and model estimation are
performed on historical data and, as a result, there is the risk that the estimation
process captures characteristics that are specific to the sample data used but are not
general and will not necessarily reappear in future samples.

■ Model testing involves assessing the model’s performance using fresh data. The
procedure for doing so is called back testing and the most popular way of doing so
is using a moving window.

■ The data generating process refers to the mathematical model that represents
future data in function of past and present data. By knowing the data generating
process as a mathematical expression, computer programs that simulate data using
Monte Carlo methods can be implemented and the data generated can be used to
compute statistical quantities that would be difficult or even impossible to compute
mathematically.

■ Financial econometric techniques have been used in the investment management


process for making decisions regarding asset allocation (i.e., allocation of funds
among the major asset classes) and portfolio construction (i.e., selection of
individual assets within an asset class). In addition, the measurements of portfolio
risk with respect to risk factors that are expected to impact the performance of a
portfolio relative to a benchmark are estimated using financial econometric
techniques.

Key points unit2. Simple Linear Regression 39

■ Correlation or covariance is used to measure the association between two


variables.

■ A regression model is employed to model the dependence of a variable (called


the dependent variable) on one (or more) explanatory variables.

■ In the basic regression, the functional relationship between the dependent


variable and the explanatory variables is expressed as a linear equation and hence
is referred to as a linear regression model.

■ when the linear regression model includes only one explanatory variable, the
model is said to be a simple linear regression.

■ The error term, or the residual, in a simple linear regression model measures the
error that is due to the variation in the dependent variable that is not due to the
explanatory variable.

■ The error term is assumed to be normally distributed with zero mean and
constant variance.

■ The parameters of a simple linear regression model are estimated using the
method of ordinary least squares and provides a best linear unbiased estimate of
the parameter.

■ The coefficient of determination, denoted by R2, is a measure of the goodness-


of-fit of the regression line. This measure, which has a value that ranges from 0 to
1, indicates the percentage of the total sum of squares explained by the explanatory
variable in a simple linear regression.
KEY Points unit3 Multiple Linear Regression

■ A multiple linear regression is a linear regression that has more than one
independent or explanatory variable.

■ There are three assumptions regarding the error terms in a multiple linear
regression:

(1) they are normally distributed with zero mean,

(2) the variance is constant, and

(3) they are independent.

■ The ordinary least squares method is used to estimate the parameters of a


multiple linear regression model.

■ The three steps involved in designing a multiple linear regression model are

(1) specification of the dependent and independent variables to be included in the


model,

(2) fitting/estimating the model, and

(3) evaluating the quality of the model with respect to the given data (diagnosis of
the model).

■ There are criteria for diagnosing the quality of a model. The tests used involve
statistical tools from inferential statistics. The estimated regression errors play an
important role in these tests and the tests accordingly are based on the three
assumptions about the error terms.

■ The first test is for the statistical significance of the multiple coefficient of
determination, which is the ratio of the sum of squares explained by the regression
and the total sum of squares.

■ If the standard deviation of the regression errors from a proposed model is found
to be too large, the fit could be improved by an alternative specification. Some of
the variance of the errors may be attributable to the variation in some independent
variable not considered in the model.
■ An analysis of variance test is used to test for the statistical significance of the
entire model.

■ Because one can artificially increase the original R2 by including additional


independent variables into the regression, one will not know the true quality of the
model by evaluating the model using the same data. To deal with this problem, the
adjusted goodness-of-fit measure or adjusted R2 is used. This measure takes into
account the number of observations as well as the number of independent
variables.

■ To test for the statistical significance of individual independent variables, a t-test


is used.

■ To test for the statistical significance of a set or group of independent variables,


Difference between sample regression
an F-test is used.
and population regression line?
A sample is a subset of a population. Usually it is impossible to test an entire
population so tests are done on a sample of that population. These samples can
be selected so that they are representative of the population in which cases the
sample will have weights, strata, and clusters.
But usually people use random samples. So it's not that the line is different, it's
that the line comes from different data. In stats we have formulas that allow a
sample to represent a population, if you have the entire population (again
unlikely), you wouldn't need to use this sample formulas, only the population
formulas.

What is the different between correlation and regression?


correlation we can do to find the strength of the variables. but regression helps to fit the
best line

What is the line of regression?


line that measures the slope between dependent and independent variables

What is Definition of linear regression and correlation in


statistics?
Whenever you are given a series of data points, you make a linear regression by
estimating a line that comes as close to running through the points as possible. To
maximize the accuracy of this line, it is constructed as a Least Square Regression Line
(LSRL for short). The regression is the difference between the actual y value of a data
point and the y value predicted by your line, and the LSRL minimizes the sum of all the
squares of your regression on the line.

A Correlation is a number between -1 and 1 that indicates how well a straight line
represents a series of points. A value greater than one means it shows a positive slope; a
value less than one, a negative slope. The farther away the correlation is from 0, the less
accurately a straight line describes the data.
What is the difference b/n stochastic error term and residual?

the residual is the difference between the observed Y and the estimated regression
line(Y), while the error term is the difference between the observed Y and the true
regression equation (the expected value of Y). Error term is theoretical concept that can
never be observed, but the residual is a real-world value that is calculated for each
observation every time a regression is run. The reidual can be thought of as an estimate of
the error term, and e could have been denoted as ^e.

Distinguish between correlation and regression?


Correlation is a measure of the degree of agreement in the changes (variances) in two or
more variables. In the case of two variables, if one of them increases by the same amount
for a unit increase in the other, then the correlation coefficient is +1. If one of them
decreases by the same amount for a unit increase in the other, then the correlation
coefficient is -1. Lesser agreement results in an intermediate value.

Regression involves estimating or quantifying this relationship.

It is very important to remember that correlation and regression measure only the linear
relationship between variables. A symmetrical relationshup, for example, y = x2 between
values of x with equal magnitudes (-a < x < a), has a correlation coefficient of 0, and the
regression line will be a horizontal line. Also, a relationship found using correlation or
regression need not be causal.
share with friends

What does the term residual mean?


Let's say that you fit a simple regression line y = mx + b to a set of (x,y) data points. In a
typical research situation the regression line will not touch all of the points; it might not
touch any of them. The vertical difference between the y-co-ordinate of one of the data
points and the y value of the regression line for the x-co-ordinate of that data point is
called a residual.

Difference between sample regression and


population regression line?
A sample is a subset of a population. Usually it is impossible to test an entire
population so tests are done on a sample of that population. These samples can
be selected so that they are representative of the population in which cases the
sample will have weights, strata, and clusters.
But usually people use random samples. So it's not that the line is different, it's
that the line comes from different data. In stats we have formulas that allow a
sample to represent a population, if you have the entire population (again
unlikely), you wouldn't need to use this sample formulas, only the population
formulas.

What is the different between correlation and regression?


correlation we can do to find the strength of the variables. but regression helps to fit the
best line

What is the line of regression?


line that measures the slope between dependent and independent variables

What is Definition of linear regression and correlation in


statistics?
Whenever you are given a series of data points, you make a linear regression by
estimating a line that comes as close to running through the points as possible. To
maximize the accuracy of this line, it is constructed as a Least Square Regression Line
(LSRL for short). The regression is the difference between the actual y value of a data
point and the y value predicted by your line, and the LSRL minimizes the sum of all the
squares of your regression on the line.

A Correlation is a number between -1 and 1 that indicates how well a straight line
represents a series of points. A value greater than one means it shows a positive slope; a
value less than one, a negative slope. The farther away the correlation is from 0, the less
accurately a straight line describes the data.
What is the difference b/n stochastic error term and residual?
the residual is the difference between the observed Y and the estimated regression
line(Y), while the error term is the difference between the observed Y and the true
regression equation (the expected value of Y). Error term is theoretical concept that can
never be observed, but the residual is a real-world value that is calculated for each
observation every time a regression is run. The reidual can be thought of as an estimate of
the error term, and e could have been denoted as ^e.

Distinguish between correlation and regression?


Correlation is a measure of the degree of agreement in the changes (variances) in two or
more variables. In the case of two variables, if one of them increases by the same amount
for a unit increase in the other, then the correlation coefficient is +1. If one of them
decreases by the same amount for a unit increase in the other, then the correlation
coefficient is -1. Lesser agreement results in an intermediate value.

Regression involves estimating or quantifying this relationship.

It is very important to remember that correlation and regression measure only the linear
relationship between variables. A symmetrical relationshup, for example, y = x2 between
values of x with equal magnitudes (-a < x < a), has a correlation coefficient of 0, and the
regression line will be a horizontal line. Also, a relationship found using correlation or
regression need not be causal.
What does the term residual mean?
Let's say that you fit a simple regression line y = mx + b to a set of (x,y) data points. In a
typical research situation the regression line will not touch all of the points; it might not
touch any of them. The vertical difference between the y-co-ordinate of one of the data
points and the y value of the regression line for the x-co-ordinate of that data point is
called a residual.

Difference between sample regression and


population regression line?
A sample is a subset of a population. Usually it is impossible to test an entire
population so tests are done on a sample of that population. These samples can
be selected so that they are representative of the population in which cases the
sample will have weights, strata, and clusters.
But usually people use random samples. So it's not that the line is different, it's
that the line comes from different data. In stats we have formulas that allow a
sample to represent a population, if you have the entire population (again
unlikely), you wouldn't need to use this sample formulas, only the population
formulas.

What is the different between correlation and regression?


correlation we can do to find the strength of the variables. but regression helps to fit the
best line

What is the line of regression?


line that measures the slope between dependent and independent variables

What is Definition of linear regression and correlation in


statistics?
Whenever you are given a series of data points, you make a linear regression by
estimating a line that comes as close to running through the points as possible. To
maximize the accuracy of this line, it is constructed as a Least Square Regression Line
(LSRL for short). The regression is the difference between the actual y value of a data
point and the y value predicted by your line, and the LSRL minimizes the sum of all the
squares of your regression on the line.

A Correlation is a number between -1 and 1 that indicates how well a straight line
represents a series of points. A value greater than one means it shows a positive slope; a
value less than one, a negative slope. The farther away the correlation is from 0, the less
accurately a straight line describes the data.
What is the difference b/n stochastic error term and residual?

the residual is the difference between the observed Y and the estimated regression
line(Y), while the error term is the difference between the observed Y and the true
regression equation (the expected value of Y). Error term is theoretical concept that can
never be observed, but the residual is a real-world value that is calculated for each
observation every time a regression is run. The reidual can be thought of as an estimate of
the error term, and e could have been denoted as ^e.

Distinguish between correlation and regression?


Correlation is a measure of the degree of agreement in the changes (variances) in two or
more variables. In the case of two variables, if one of them increases by the same amount
for a unit increase in the other, then the correlation coefficient is +1. If one of them
decreases by the same amount for a unit increase in the other, then the correlation
coefficient is -1. Lesser agreement results in an intermediate value.
Regression involves estimating or quantifying this relationship.

It is very important to remember that correlation and regression measure only the linear
relationship between variables. A symmetrical relationshup, for example, y = x2 between
values of x with equal magnitudes (-a < x < a), has a correlation coefficient of 0, and the
regression line will be a horizontal line. Also, a relationship found using correlation or
regression need not be causal.
What does the term residual mean?
Let's say that you fit a simple regression line y = mx + b to a set of (x,y) data points. In a
typical research situation the regression line will not touch all of the points; it might not
touch any of them. The vertical difference between the y-co-ordinate of one of the data
points and the y value of the regression line for the x-co-ordinate of that data point is
called a residual.

Difference between sample regression and


population regression line?
A sample is a subset of a population. Usually it is impossible to test an entire
population so tests are done on a sample of that population. These samples can
be selected so that they are representative of the population in which cases the
sample will have weights, strata, and clusters.
But usually people use random samples. So it's not that the line is different, it's
that the line comes from different data. In stats we have formulas that allow a
sample to represent a population, if you have the entire population (again
unlikely), you wouldn't need to use this sample formulas, only the population
formulas.

What is the different between correlation and regression?


correlation we can do to find the strength of the variables. but regression helps to fit the
best line

What is the line of regression?


line that measures the slope between dependent and independent variables

What is Definition of linear regression and correlation in


statistics?
Whenever you are given a series of data points, you make a linear regression by
estimating a line that comes as close to running through the points as possible. To
maximize the accuracy of this line, it is constructed as a Least Square Regression Line
(LSRL for short). The regression is the difference between the actual y value of a data
point and the y value predicted by your line, and the LSRL minimizes the sum of all the
squares of your regression on the line.

A Correlation is a number between -1 and 1 that indicates how well a straight line
represents a series of points. A value greater than one means it shows a positive slope; a
value less than one, a negative slope. The farther away the correlation is from 0, the less
accurately a straight line describes the data.
What is the difference b/n stochastic error term and residual?

the residual is the difference between the observed Y and the estimated regression
line(Y), while the error term is the difference between the observed Y and the true
regression equation (the expected value of Y). Error term is theoretical concept that can
never be observed, but the residual is a real-world value that is calculated for each
observation every time a regression is run. The reidual can be thought of as an estimate of
the error term, and e could have been denoted as ^e.

Distinguish between correlation and regression?


Correlation is a measure of the degree of agreement in the changes (variances) in two or
more variables. In the case of two variables, if one of them increases by the same amount
for a unit increase in the other, then the correlation coefficient is +1. If one of them
decreases by the same amount for a unit increase in the other, then the correlation
coefficient is -1. Lesser agreement results in an intermediate value.

Regression involves estimating or quantifying this relationship.

It is very important to remember that correlation and regression measure only the linear
relationship between variables. A symmetrical relationshup, for example, y = x2 between
values of x with equal magnitudes (-a < x < a), has a correlation coefficient of 0, and the
regression line will be a horizontal line. Also, a relationship found using correlation or
regression need not be causal.
What does the term residual mean?
Let's say that you fit a simple regression line y = mx + b to a set of (x,y) data points. In a
typical research situation the regression line will not touch all of the points; it might not
touch any of them. The vertical difference between the y-co-ordinate of one of the data
points and the y value of the regression line for the x-co-ordinate of that data point is
called a residual.
Population Regression Function vs Sample
Regression Function?
1.

PRF is based on population data as a whole,

SRF is based on Sample data

2.

We can draw only one PRF line from a given population.

But we can Draw one SRF for one sample from that population.

3. PRF may exist only in our conception and imagination.

4.

PRF curve or line is the locus of the conditional mean/ expectation of the independent variable Y
for the fixed variable X in a sample data.

SRF shows the estimated relation between dependent variable Y and explanatory variable X in a
sample.

What is the relation between Sample and population regression function?

The sample regression function is a statistical approximation to the population regression


function.

In a regression of a time series that states data as a function of calendar year


what requirement of regression is violated?

In a regression of a time series that states data as a function of calendar year, what requirement
of regression is violated?

What is the difference between simple and multiple linear regression?

I want to develop a regression model for predicting YardsAllowed as a function of Takeaways,


and I need to explain the statistical signifance of the model.
What are the advantages and disadvantage of logistic regression compared with
linear regression analysis?

It all depends on what data set you're working with. There a quite a number of different
regression analysis models that range the gambit of all functions you can think of. Obviously
some are more useful than others. Logistic regression is extremely useful for population
modelling because population growth follows a logistic curve. The final goal for any regression
analysis is to have a mathematical function that most closely fits your data, so advantages and
disadvantages depend entirely upon that.

What is the role of the stochastic error term in regression analysis?

Regression analysis is based on the assumption that the dependent variable is distributed
according some function of the independent variables together with independent identically
distributed random errors. If the error terms were not stochastic then some of the properties of
the regression analysis are not valid.

What is sample regression function?

To take a simple case, let's suppose you have a set of pairs (x1, y1), (x2, y2), ... (xn, yn). You have
obtained these by choosing the x values and then observing the corresponding y values
experimentally. This set of pairs would be called a sample.

For whatever reason, you assume that the y's and related to the x's by some function f(.), whose
parameters are, say, a1, a2, ... . In far the most frequent case, the y's will be assumed to be a
simple linear function of the x's: y = f(x) = a + bx.

Since you have observed the y's experimentally they will almost always be subject to some error.
Therefore, you apply some statistical method for obtaining an estimate of f(.) using the sample of
pairs that you have.

This estimate can be called the sample regression function. (The theoretical or 'true' function f(.)
would simply be called the regression function, because it does not depend on the sample.)

What is used to show the relationship between two factors?

Not a function because it should not map one value to many (eg square root).
Not the regression coefficient since for an even function it would be 0.

F-ratio in econometrics can be expressed as function of R square if one can be


expressed as a function of other why do you need both?

The F-ratio can be expressed as a function of the R^2 only under certain assumptions (e.g. linear
regression model). There are econometric models where the R^2 is not meaningfully defined or
the F-ratio cannot be expressed in terms of the R^2, but you can still carry out an F-test, .
What are some examples where the mean the median and the mode might be the
same?

(10, 15, 15, 15, 20)

The answer above displays a sample in which the sample mean, sample median and sample
mode assume the same value.

If you were asking about populations, then the population mean, population median and
population mode are the same whenever the probability density function for the population is
symmetric. For example, the normal probability density function is symmetric, the t and uniform
density functions are symmetric. Many are.
Is it possible for a function that has a horizontal asymptote to attain the value of
an asymptote?

Yes.

Think of a function that starts at the origin, increases rapidly at first and then decays gradually to
an asymptotic value of 0. It will have attained its asymptotic value at the start.

For example, the Fisher F distribution, which is often used, in statistics, to test the significance of
regression coefficients. Follow the link for more on the F distribution.

How do you calculate sample standard deviation?

Here's how you do it in Excel: use the function =STDEV(<range with data>). That function
calculates standard deviation for a sample.

What is the difference between least squares Mean and Mean?

Mean is the sum of several values of the same type (x1, x2,..., xN ) divided by the number of
values.
Mean = (x1 + x2 + ... xN ) /N

The Least square method is used when doing a regression of a cloud of point { (x1,y1), (x2,y2)
etc. } by a function (linear, parabolic hyperbolic etc.). With this special algorithm we get the
closest function f (x) to approximated the cloud of point.
f(x, Beta) ~ y
Beta = (XTX)-1XT Y = coefficients of the regression

The points must be in 2 dimensions, because the methods needs to derivate the function f.
I think that the least square mean is not the proper term because you have a function f ... What is
the mean of f (x) = a *x + b ??? See.

What is the difference between Multicollinearity and Autocorrelation?

The difference between multicollinearity and auto correlation is that multicollinearity is a linear
relationship between 2 or more explanatory variables in a multiple regression while while auto-
correlation is a type of correlation between values of a process at different points in time, as a
function of the two times or of the time difference.

When are OLS estimators BLUE?

For Classical Regression Model the OLS or Ordinary Least Squares - estimators (or the betas)
are BLUE (Best, Linear, Unbiased, Estimator) when :

1. The regression is linear in the coefficients, it is correctly specified and has an additive
error term.
2. Mean of the error term is zero. (Include a constant term in the regression (B0 which will
force the mean to be zero)
3. The independent variables are not correlated with the error term. (If they are correlated
then the betas will be biased.)
4. Observations of the error term (the residuals) are not correlated with each other.
5. The error term has a constant variance (Homoskedasticity)
6. No independent variable is a perfect linear function of any of the other independent
variable. (If this is true - multicollinearity will occur)

What are some examples of distribution function?

I will assume that you are asking about probability distribution functions. There are two types:
discrete and continuous. Some might argue that a third type exists, which is a mix of discrete and
continuous distributions.
When representing discrete random variables, the probability distribution is probability mass
function or "pmf." For continuous distributions, the theoretical distribution is the probability
density function or "pdf."
Some textbooks will call pmf's as discrete probability distributions.
Common pmf's are binomial, multinomial, uniform discrete and Poisson.
Common pdf's are the uniform, normal, log-normal, and exponential.
Two common pdf's used in sample size, hypothesis testing and confidence intervals are the "t
distribution" and the chi-square. Finally, the F distribution is used in more advanced hypothesis
testing and regression.

Why in Cox's regression model partial likelihood is used instead of ordinary


likelihood function?

Cox model applies to observations in time (i.e. processes, or functions of t). The true likelihood
for that function would be a function of (functions of t), obtained by expressing the probability in
a space of (functions of t) as
[density]*[reference measure on (functions of t)]
The factor [density] would be the true likelihood.
The partial likelihood is a factor of [density] involving only the parameters of interest:
[density] = [partial likelihood]*[....]
There is no point in working with the full likelihood, in the sense that the nice properties of the
MLE apply to parameters from a finite dimensional space, and would not automatically apply to
the full likelihood in the space of (functiosn of t).
That is why, for example, one needs to rework the large sample theory of estimators based on
partial likelihood.

share with friends

A statistical function commonly used to describe a group of data?

A group of data is a sample.

What does statisticcal mean?

1. element of data: a single element of data from a collection


2. numerical value or function: a numerical value or function, e.g. a mean or standard deviation,
used to describe a sample or population
3. piece of information: somebody or something treated as a piece of data or information.

What is the difference between corelation and regression?

I've included links to both these terms. Definitions from these links are given below. Correlation
and regression are frequently misunderstood terms. Correlation suggests or indicates that a linear
relationship may exist between two random variables, but does not indicate whether X causes
Yor Y causes X. In regression, we make the assumption that X as the independent variable can
be related to Y, the dependent variable and that an equation of this relationship is useful.
Definitions from Wikipedia: In probability theory and statistics, correlation (often measured as
a correlation coefficient) indicates the strength and direction of a linear relationship between
two random variables. In statistics, regression analysis refers to techniques for the modeling and
analysis of numerical data consisting of values of a dependent variable (also called a response
variable) and of one or more independent variables (also known as explanatory variables or
predictors). The dependent variable in the regression equation is modeled as a function of the
independent variables, corresponding parameters ("constants"), and an error term. The error term
is treated as a random variable. It represents unexplained variation in the dependent variable.
The parameters are estimated so as to give a "best fit" of the data. Most commonly the best fit is
evaluated by using the least squares method, but other criteria have also been used.

What is the meaning of Sampling distribution of the test statistic?

Given any sample size there are many samples of that size that can be drawn from the
population. In the population is N and the sample size in n, then there are NCn, but remember that
the population can be infinite.

A test statistic is a value that is calculated from only the observations in a sample (no unknown
parameters are estimated). The value of the test statistic will change from sample to sample. The
sampling distribution of a test statistic is the probability distribution function for all the values
that the test statistic can take across all possible samples.

How do you determine a line of a best fit or a slope of a graph?

The line of best fit is found by statistical calculations which this site is too crude for. Look up
least squares regression equation if you really wish to follow up. The slope of a graph is the
slope of the tangent to the graph curve at the point in question. If the function of the graph is y =
f(x) then this is the limit, as dx tends to 0, of [f(x + dx) - f(x)]/dx.

share with friends

Is the inverse of an exponential function the quadratic function?

No. The inverse of an exponential function is a logarithmic function.

What is a random variable?

A random variable is a function that assigns unique numerical values to all possible outcomes of
a random experiment.

A real valued function defined on a sample space of an experiment is also called random
variable.

What is the parent function for the exponential function?


The parent function of the exponential function is ax

The inverse of a function is always a function?

a function is a added to the iverse function and multiply the SQURED AND CUBIC OR ethc......

What do you call the inverse function of the exponential function?

Logarithmic Function

What is difference between chi square and reduced chi square?

A reduced chi-square value, calculated after a nonlinear regression has been performed, is the is
the Chi-Square value divided by the degrees of freedom (DOF). The degrees of freedom in this
case is N-P, where N is the number of data points and P is the number of parameters in the fitting
function that has been used. I have added a link, which explains better the advantages of
calculating the reduced chi-square in assessing the goodness of fit of a non-linear regression
equation. In fitting an equation to the data, it is possible to also "over fit", which is to account for
small and random errors in the data, with additional parameters. The reduced chi-square value
will increase (show a worse fit) if the addition of a parameter does not significantly improve the
fit. You can also do a search on reduced chi-square value to better understand its importance.

How does the graph of the Mandelbrot set function relate to composite
functions?

The Mandelbrot graph is generated iteratively and so is a function of a function of a function ...
and in that sense it is a composite function.

What are the ways in describing a function?

A formula or graph are two ways to describe a math function. How a math function is described
depends on the domain of the function or the complexity of the function.

Is an exponential function is the inverse of a logarithmic function?

No, an function only contains a certain amount of vertices; leaving a logarithmic function to
NOT be the inverse of an exponential function.

How do you change an exponential functions to a logarithmic function?

If y is an exponential function of x then x is a logarithmic function of y - so to change from an


exponential function to a logarithmic function, change the subject of the function from one
variable to the other.

When comparing two sample means what is the null hypothesis?

Often it is that the two means are the same. But more generally, it is that some function of the
two means is zero.

What is function notation in math terms?

An equation where the left is the function of the right.


f(x)=x+3 is function notation. The answer is a function of what x is.
f(g(x))= the answer the inside function substituted in the outside function.

What is the definition of the domain of function?

The domain of a function is simply the x values of the function

Function of calorimeter?
WHAT IS THE FUNCTION OF CALORIMETER?
Its function is to temperate the heat of an object.

share with friends

Is A function with a graph that is symmetric about the origin an even function?

An even function is symmetric about the y-axis. If a function is symmetric about the origin, it is
odd.

Example of fundamental difference between a polynomial function and an


exponential function?

fundamental difference between a polynomial function and an exponential function?

Is y plus x2 plus 1 a function?

That is not a function, although it does involve the function of addition. A function is something
that is done to numbers.

Is a sine function a periodic function?

Yes, the sine function is a periodic function. It has a period of 2 pi radians or 360 degrees.

share with friends

Is it true or false that the cosine function is an odd function?

False; the cosine function is an even function as cos(-x) = -cos(x).

What is the function of zero order hold?

Zero order hold is used in Digital - Analogue converters (DACs). It literally holds the digital
signal for the sample time, then moves to the next digital sample and holds that signal for the

What number will the function return if the input is 12.2?

It depends upon the definition of the function........


If the function is Output = 3 x Input, the function will return 36.6.
Similarly, if the function is Output = 3 + Input, the function will return 15.2.