Sunteți pe pagina 1din 52

REGRESSION ?

What We Learn ?

• Simple Linear Regression


• Multiple Linear Regression
• Regression with Excel
What is Regression ?

Statistical analysis technique to find out


relationship between two or more variable
1
Simple Linear
Regression
What is Simple Linear Regression ?

One Dependent Variable

One Independent Variable


Correlation Coeficient ?

ρ = Population Correlation
r = Sample Correlation

To measure the strength of the linear


relationship
Correlation Coeficient ?

y y y

x x x
r = -1 r = -.6 r=0

y y

x x
r = +.3 r = +1
Sample correlation coefficient:

r
 ( x  x)( y  y) r
n xy   x  y
[ ( x  x ) ][  ( y  y ) ]
2 2
[n( x 2 )  ( x )2 ][n(  y 2 )  ( y )2 ]

where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Significance Test for Correlation :

H 0: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)

r
t Df = n -2
1 r 2
n2
Simple Linear Regression Model :

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value intercept

Independent

ŷ i  b 0  b1x variable
Simple Linear Regression Model :

b1 
 ( x  x )( y  y )
b0 
    x xy
y x 2

 (x  x) n x   x 
2 2 2

• b0 is the estimated average value of y when the


value of x is zero
• b1 is the estimated change in the average value of y
as a result of a one-unit change in x
Explained & Unexplained Variable
y
yi
y
SSE = Ʃ (yi - yi )2
_
SST = Ʃ(yi - y)2
y
SSR = Ʃ (yi - y)2
y y

x
Xi
Coeficient of Determination :

SSR
R 
2

SST the portion of the total variation


in the dependent variable that is
explained by variation in the
independent variable
R r 2 2
Coeficient of Determination :

y y

x x
R2 =1 R2 = +1
Standard Error of Estimate :

SSE
s  SSE = Sum of squares error
n  k 1 n = Sample size
k = number of independent variables in the model

Standard Deviation of The Slope :

sε sε
sb1  
 (x  x) 2
(
 x2  n x) 2
Variation of observed y Variation in the slope of
values from the regression regression lines from different
y line y possible samples

x x
small s small sb1

y y

x x
large s large sb1
t Test for Slope :

H0: β1 = 0 (no linear relationship)


H1: β1 ≠ 0 (linear relationship does
exist)

Df = n - 2

b1  β1
t
sb1
Interval Estimate :

Average y :

1 (x p  x)2
ŷ  t /2sε 
n  (x  x)2

Individual y :

1 (x p  x)
2

ŷ  t /2sε 1 
n  (x  x)2
Example :

Midwest Distribution Company consider the application involving Midwest


Distribution, which supplies soft drinks and snack foods to convenience stores in
Michigan, Illnois and Lowa. The Company believe there is a correlation between sales
and years of store with company. 12 samples is taken, estimate how much sales if
there is store that have been together with the company for 4.5 years

Years Sales Years Sales


3 487 1 238
5 445 4 312
2 272 2 296
8 641 9 655
2 187 6 563
6 440
7 346
Example :

x y xy x2 y2
3 487 1,461 9 237,169 Correlation Coefficient
5 445 2,225 25 198,025
2 272 544 4 73,984 n  xy   x  y
r
8 641 5,128 64 410,881 [n( x 2 )  ( x) 2 ][n( y 2 )  ( y)2 ]
2 187 374 4 34,969
12(26.145)  (55)(4.882)
6 440 2,640 36 193,600 
7 346 2,422 49 119,716 [12(329)  (55) 2 ][12(2.255.942)  (4.882) 2 ]
1 238 238 1 56,644  0.8325
4 312 1,248 16 97,344
2 296 592 4 87,616
9 655 5,895 81 429,025
6 563 3,378 36 316,969
55 4,882 26,145 329 2,255,942
Example :

t test for correlation :

d.f. = 12-2 = 10 H 0: ρ = 0
HA: ρ ≠ 0
/2=.025 /2=.025
r .8325
t   4.75
1 r 2
1  .8325 2

n2 12  2
Reject H0 Do not reject H0 Reject H0
-tα/2 0 tα/2
-2.2281 2.2281
4.75
Example :
(x -
x y x - xbar y - ybar (x - xbar)(y-ybar)
xbar)2
Regression Model :
3 487 -1.58 80.17 2.51 -126.93

 ( x  x )( y  y )  49.003
5 445 0.42 38.17 0.17 15.90
2 272 -2.58 -134.83 6.67 348.32
b1 
 (x  x) 2 8
2
641
187
3.42
-2.58
234.17
-219.83
11.67
6.67
800.07
567.90
6 440 1.42 33.17 2.01 46.99
7 346 2.42 -60.83 5.84 -147.01

b0 
 y  x   x xy  182.235
2 1
4
238
312
-3.58
-0.58
-168.83
-94.83
12.84
0.34
604.99
55.32
n x   x 
2 2
2 296 -2.58 -110.83 6.67 286.32
9 655 4.42 248.17 19.51 1096.07
6 563 1.42 156.17 2.01 221.24
55 4,882 76.92 3769.17
Example :

Regression Model :

700

600

500
Sales = 182.235 + 49.003(Years)
400
Sales

300

200

100

0
0 2 4 6 8 10
Years
Example :

Coefficient of Determination :
Y (y - y
x y (y - ybar)2
regresi regresi)2
3 487 329.24 6,426.69 24,886.69 SST   ( y  y ) 2  269,781.67
5 445 427.25 1,456.69 315.01

SSE   ( y  yˆ ) 2  85,080.25
2 272 280.24 18,180.03 67.92
8 641 574.26 54,834.03 4,454.08
2 187 280.24 48,326.69 8,694.00
6 440 476.25 1,100.03 1,314.40 SSR  SST  SSE  184,701.42
7 346 525.26 3,700.69 32,133.38
1 238 231.24 28,504.69 45.72
4 312 378.25 8,993.36 4,388.81
SSR
R   0.6846
2 296 280.24 12,284.03 248.33 2
9 655 623.26 61,586.69 1,007.15
6 563 476.25 24,388.03 7,524.76 SST
55 4,882 4882 269,781.67 85,080.25
Example :

Standard Error of Estimate :

SSE 85,080.25
s    92.24
n  k 1 12  1  1

Standard Deviation of The Slope :

sε 92.24
s b1    10.517
 (x  x) 2
76.92
Example :

Coefficient of Determination : H0: β1 = 0


H1: β1 ≠ 0

d.f. = 12-2 = 10 b1  β1 49.003  0


t   4.6594
s b1 10.517
/2=.025 /2=.025

Reject H0 Do not reject H0 Reject H0


-tα/2 0 tα/2
-2.2281 2.2281
4.6594
Example :

Estimate Individul y :

Sales = 182.235 + 49.003(Years)


= 182.235 + 49.003(4.5)
= 402.7485

1 (x p  x) 1 (4.5  4.58) 2
2

yˆ  t/2s ε 1   402.7485  2.2281 1    402.7485  3.269


n  (x  x) 2 12 76.92
2
Multiple Linear
Regression
What is Multiple Linear Regression ?

y
ŷ  b0  b1x1  b2 x 2

One Dependent Variable


x2

Two or more Independent Variable


x1
Adjusted R Square :

R2 always increase when new x variable added to model despite there is


no correlation

Adjusted R2 shows proportion the proportion of variation in y explained


by all x variables adjusted for the number of x variables

 n 1 
R 2A  1  (1  R 2 ) 
 n  k  1 
F-Test for overall significance :

H0: β1 = β2 = … = βk = 0 (no linear relationship)


HA: at least one βi ≠ 0 (at least one independent variable affects y)

SSR
k MSR (numerator) D1 = k
F  (denominator) D2 = (n – k - 1)
SSE MSE
n  k 1
t-Test for individual significance :

H0: βi = 0 (no linear relationship)


HA: βi ≠ 0 (linear relationship does exist between xi and y)

bi  0
t df = (n – k - 1)

sb i
Multicollinearity

High correlation between independent variables, so the variable


contribute redundant information

Variance Inflationary Factor

1
VIFj  If, VIFj > 5, xj highly correlated with other explanatory
1  R2j variable
Qualitative Multiple Linear Regression ?

When the explanatory variable is yes or no, female or male,


holiday or not holiday

Using code 0 or 1

y = pie sales ŷ  b0  b1x1  b2 x 2


x1 = price
x2 = holiday (X2 = 1 if a holiday occurred during the week)
(X2 = 0 if there was no holiday that week)
3
Regression with
Excel
How to use Excel to Calculate Regression
?

Correlation with Excel


▰ Click Data > Data Analysis
▰ Select Correlation
▰ Define variable range (all rows & columns)
▰ Click OK
How to use Excel to Calculate Regression
?

Regression with Excel


▰ Click Data > Data Analysis
▰ Select Regression
▰ Define y variable range dan x variable range
▰ Set Level of Significance
▰ OK
Multiple Regression Example (Excel) :

Sales Salesman Price

564 10 70
601 13 50
560 12 65
616 13 50
674 15 45
630 15 65
554 14 63
532 15 65
661 17 64
Multiple Regression Example (Excel) :

Correlation
Sales Salesman Price
Sales 1 0.518062386 -0.545613556
Salesman 0.518062386 1 -0.163554532
Price -0.545613556 -0.163554532 1

Number of Salesmans have a positive correlation to number of sales

Price has negative correlation to number of sales


Multiple Regression Example (Excel) :

Regression

Sales = 610.86 + 10.75(Salesman) – 2.68(Price)

Sales will increase Sales will decrease


10.75 for each 1 2.68 for each 1
salesman increase price higher
Multiple Regression Example (Excel) :

Regression

48.66% of the variation in


sales is explained by the
variation in salesman and
price
Multiple Regression Example (Excel) :

Regression

31.55% of the variation in sales is


explained by the variation in salesman
and price, taking into account the
sample size and number of
independent variables
Multiple Regression Example (Excel) :

Regression

The standard deviation of the


regression model is 41.35
Multiple Regression Example (Excel) :

Regression
F =2.844

H0: β1 = β2 = 0
HA: β1 and β2 not both zero  = .05

0
F
Do not Reject H0
reject H0 F = 5.143
.05
Multiple Regression Example (Excel) :
-1.597 1.486
Regression

a/2=.025 a/2=.025
H0: βi = 0
HA: βi ≠ 0

Df = 8-2-1 = 5 Reject H0-t Do not reject t Reject H0


α/2 α/2
H00
- 2.5706 2.5706
Multiple Regression Example (Excel) :

Regression

Sales are estimated to be reduced by


between -6.95 to 28.46 for each increase
of 1 salesman. Since 0 is include in that
interval there no significant relationship
between sales and salesman
BULKOM
Number of Stocks Rate of Returns
1. A student intern at the investment firm
of McMillan & Associates was given 9 0.13
the assignment of determining 16 0.16
whether there is a positive correlation
between the number of individual 25 0.21
stocks in a client’s portfolio and the 16 0.18
annual rate of return for the porfolio. 20 0.18
The intern selected a simple random
sample of 10 client porfolios and 16 0.19
determined the number of individual 20 0.15
company stocks and the annual rate of
20 0.17
return earned by the client on his or
her portfolio. 16 0.13
9 0.11
2. Y X1 X2

103 50 10

85 45 8 Produce the regression equation, is the


115 37 11
there a significance relationship
between independent variables and
73 35 7 dependent variable ?
97 44 10

102 51 11

65 42 6
3. A real estate agent wishes to determine the selling price of residences using the
size (square feet) and whether the residence is a condominium, single-family home
or SRO. The agent believe that holiday also related to the selling price. Produce
the regression equation to predict the selling price, how much selling price
explained? A sample of 20 residences was obtained with the following results :
Selling Price Square Feet Type Holiday
US$ 269,700.00 1500 Family Yes
US$ 211,800.00 2085 Condo No
US$ 257,100.00 1450 Family Yes
US$ 224,400.00 1836 SRO Yes
US$ 245,800.00 1730 Family No
US$ 180,900.00 1726 SRO No
US$ 346,200.00 2300 Family Yes
US$ 243,600.00 1650 Condo No
US$ 289,000.00 1950 Family No
US$ 164,400.00 1545 SRO No
US$ 175,600.00 1375 Condo No
US$ 238,000.00 1825 Condo Yes
US$ 230,500.00 1650 Family No
US$ 253,300.00 1960 SRO Yes
US$ 213,200.00 1360 Condo Yes
US$ 180,200.00 1200 Condo No
US$ 277,100.00 2000 SRO Yes
US$ 297,200.00 1755 Family Yes
US$ 265,200.00 1850 SRO Yes
US$ 266,100.00 1630 Family No
Thank You

S-ar putea să vă placă și