Linear Regression Self Study Summary Notes

Linear Regression
Hypothesis Introduction
Testing • A hypothesis is a claim (assumption) about a population parameter based on the
statistics of the sample
Types of errors
Types of tests
Methods of rejecting the null hypothesis

Correlation Introduction
Analysis (2 + 3) • Correlation analysis expresses the relationship between 2 data series in a single
number (correlation coefficient)
• It measures the extent of a linear relationship between two variables.
o If all the points on a scatter plot fall on a line with a positive/ negative
slope, then the correlation coefficient would be +1/ -1
o A scatter plot with no linear relationship would have a correlation
coefficient of 0.
Limitation
• Correlation analysis only measures the linear relationship between two variables.
Two variables with a strong nonlinear relationship will have a low correlation
• Correlation can also be unreliable if outliers are present, which may dramatically
affect the calculation of the correlation coefficient
• Correlation does not imply causation. Even if two variables are highly correlated,
it does not mean that one causes the other
• Spurious correlations due to chance relationships, mixing two variables with a
third, and analyzing two variables that are both related to a third variable
Covariance
• The covariance of X with itself equals to the variance of X
• Covariance measures the nature o the relationship between two variables
Correlation Coefficient
• The numerical value of variance is not as meaningful as its unit-squared format –
Standard Deviation
• The numerical value of covariance is not as meaningful as its standardized format

– Correlation coefficient
Hypothesis Testing for the Correlation Coefficient
• The null hypothesis states the correlation in the population is 0. The alternative
hypothesis states the correlation in the population is different from 0
• The t-test can be used on the sample correlation, r, assuming both variables are
normally distributed.
• The decision rule is that the null hypothesis will be rejected
• When sample size increases, t-stat increases (numerator increases) and t-crit
decreases (higher degree of freedom), the required sample correlation, r, can
be relatively small and a false null hypothesis is more likely to be rejected
• When sample correlation increases, t-stat increases (numerator increases), the
required sample size, n, can be relatively small and a false null hypothesis is
more likely to be rejected
Single Introduction
Regression • Linear regression model describes the relationship between the dependent and
Analysis (3 + 3 + the independent variables
3)
• bo and b1 are regression coefficients/ parameters (b0 is y-intercept term and b1

is slope coefficient).
• Linear regression line computes the best fit line
• Symbols with hat are estimates of regression parameters, used for testing or
making predictions on dependent variables
• The regression equation implies the value of a dependent variable will increase
by b1 units when the value of the independent variable increases by one unit.
• Linear regression line minimizes the sum of squared errors (SSE), aka the sum of
squared regression residuals (differences between actual Y and predicted Y)
• The optimal value for the slope and intercept using a least squares approach can
be calculated with the following formulas:
Limitations
Assumptions
Standard Errors of Estimates (SEE) & Coefficient of Determination

• SSE measures how well a regression model captures the relationship between 2
variables (the quality of the fit of a given linear regression model)
• The denominator has n – 2 degrees of freedom because there are n observations

and two parameters (𝑏̂0 , 𝑏̂1 )
• SSE measures the standard deviation of the deviation of the residual terms. The
smaller SEE, the more accurate the predictions based on the model
• SSE tells how certain we can be about the prediction of Y, but it does not tell how
well the independent VAR explains the variation in the dependent VAR.
• The coefficient of determination measures the fraction of the total variation in
the dependent variable that is explained by the independent variable.
• For a linear regression with only one variable
• For a linear regression with one or more variable

Confidence Intervals for cegression coefficients
• A confidence interval is a range of values within which we believe the true
population parameter lies, with a certain degree of confidence (1 - ∝)
• In a confidence interval, we examine whether the hypothesized value of the

population parameter, slope coefficient (𝑏1 ) lies within a computed inverval
with a certain degree of confidence (1 - ∝)
• In a hypothesis testing, we examine whether the estimated value of the
parameter from the sample data (𝑏̂1 ) lies within a rejection region at a certain
level of significance (∝)
Hypothesis Testing for Individual regression coefficients (T – Test)

• Regarding linear regression, t-tests are often used for the population values of
the intercept and slope coefficients.
• Null and alternate hypotheses
• Test statistics
• A lower level of significance (∝) increases the absolute value of t-stat, increase
the confidence interval, a lower likelihood of rejecting the null hypothesis
→ increase type I error and decrease type II error. Vice versa for a higher ∝
• A smaller standard error of the estimates decreases the confidence interval, a
higher likelihood of rejecting the null hypothesis
→ increase type I error and decrease type II error
• P-value is the lowest level of significance at which the null hypothesis can be
rejected. If p-value is smaller than ∝, the null hypothesis can be rejected
Analysis of variance (k+1 includes the y-intercept & the slope coefficients)
Hypothesis Testing for all regression coefficients in the model (F – Test)

• One-tailed F-test is used to determine if all slope coefficients are 0. F-test is not
commonly used if there is only one variable as it will duplicate t-test
• Null hypothesis and alternate hypothesis
• F statistics
Prediction Intervals
• Confidence interval is used for independent variables (e.g. slope coefficent)
• Prediction interval is the confidence interval for dependent variable
• Predicted value of the dependent variable
• Variance of the prediction errors (Sx is the variance of independent variable)
• The (1 - ∝) percent prediction interval

Linear Regression Self Study Summary Notes

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Linear Regression Self Study Summary Notes

Încărcat de

Drepturi de autor:

Formate disponibile

Linear Regression

Methods of rejecting the null hypothesis

• Covariance measures the nature o the relationship between two variables

• The numerical value of covariance is not as meaningful as its standardized format

• The decision rule is that the null hypothesis will be rejected

• bo and b1 are regression coefficients/ parameters (b0 is y-intercept term and b1

Standard Errors of Estimates (SEE) & Coefficient of Determination

• The denominator has n – 2 degrees of freedom because there are n observations

• For a linear regression with one or more variable

• In a confidence interval, we examine whether the hypothesized value of the

Hypothesis Testing for Individual regression coefficients (T – Test)

• The decision rule is that the null hypothesis will be rejected

Hypothesis Testing for all regression coefficients in the model (F – Test)

• The decision rule is that the null hypothesis will be rejected

• Variance of the prediction errors (Sx is the variance of independent variable)

• The (1 - ∝) percent prediction interval

S-ar putea să vă placă și