Sunteți pe pagina 1din 22

Introduction to Regression Analysis Using Excel

Why and When to Use Regression

Dependent Variable or Response Y, the variable we wish to explain Independent or Regressor Variables , the variables used to explain the dependent variable Relation beteween Y and

is called a regression model

Multiple Linear Regression Model


Slope Coefficient Independent Variable Random Error term, or residual

Interception Dependent Variable

Linear Regression Assumptions


The probability distribution of the residual is normal (normality assumption)
The probability distribution of the residual has constant variance (constant variance assumption) Residuals are statistically independent

Multiple Linear Regression Model


(continued)

y
Observed Value of y for xi

i
Predicted Value of y for xi
Intercept = 0

Slope = 1

Random Error for this x value

xi

Least Squares Criterion


are obtained by finding the values of b0 that minimize the sum of the squared residuals

Scatter Plot Examples


Linear relationships y y Curvilinear relationships

x No relationship y y Strong relationships y

x Weak relationships

Calculating the Correlation Coefficient


The correlation coefficient r measures the strength of the association between the variables

Examples of Approximate r Values


y y y

r = -1
y

r = -.6
y

r=0

r = +.3

r = +1

Linear Regression Assumptions


A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) A random sample of 10 houses is selected Dependent variable (y) = house price in $1000s Independent variable (x) = square feet

Sample Data for House Price Model

Scatter Plot And Correlation Coefficient

Regression Using Excel


Tools / Data Analysis / Regression

Excel Output
The regression equation is:

Graphical Presentation

Analysis of Variance (ANOVA)

Explained and Unexplained Variation


(continued)

Coefficient of Determination, R2
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable

The coefficient of determination is also called R-squared and is denoted as R2


where

0 R2 1

Adjusted R2
The adjusted R2 is a statistic that is adjusted for the size of the model, that is, the number of factors

Excel Output

58.08% of the variation in house prices is explained by variation in square feet

Residual Plot
RESIDUAL OUTPUT
Predicted House Price 1 2 3 4 5 251.92316 273.87671 284.85348 304.06284 218.99284

House Price Model Residual Plot


Residuals -6.923162
Residuals
80 60 40 20 0 -20 -40 -60 Square Feet 0 1000 2000 3000

38.12329 -5.853484 3.937162 -19.99284

6
7 8 9

268.38832
356.20251 367.17929 254.6674

-49.38832
48.79749 -43.17929 64.33264

10

284.85348

-29.85348

Thank you

S-ar putea să vă placă și