Sunteți pe pagina 1din 45

METODE KUADRAT TERKECIL (LEAST SQUARE METHOD)

Budi Waluyo

FAKULTAS PERTANIAN UNIVERSTAS BRAWIJAYA 2009

Metode kuadrat terkecil


digunakan untuk mendapatkan penaksir koefisien regresi linier

Model regresi sederhana


Model regresi linier sederhana dinyatakan dengan persamaan :
Y = F0 + F1X + I , model umum Yi = F0 + F1Xi + Ii , model setiap pengamatan

Didapatkan eror, yaitu I atau Ii


I = Y = Y bo b1X, atau Ii = Yi i = Yi bo b1Xi

Graphical - Judgmental Solution

Titik-titik merah adalah nilai hasil eksperimen, di-notasikan Yi , yang diduga membentuk garis lurus Garis inilah model yang akan ditaksir, dengan cara menaksir koefisiennya, yaitu b0 dan b1, sehingga terbentuk persamaan b0 + b1 Xi. Garis tegak lurus sumbu horisontal yang menghubungkan titik eksperimen dengan garis lurus dugaan dinamai error.

Graphical - Judgmental Solution

1 b0

b1

The Least Square Method


yi y1 y2 y3 . . yn xi x1 x2 x3 . . xn
n

yi b0  b1 x1 b0  b1 x2 b0  b1 x3 . . b0  b1 xn

Min

2 Z ! (y i  y i )
i !1

! (y i  b0  b1 x i ) 2
i !1

Classic Minimization
Min Z ! (y i  b0  b1 x i ) 2
i !1 n

We want to minimize this function with respect to b0 and b1 This is a classic optimization problem. We may remember from high school algebra that to find the minimum value we should get the derivative and set it equal to zero.

The Least Square Method


Note : Our unknowns are b0 and b1 . xi and yi are known. They are our data
yi y1 y2 y3 . . yn

xi x1 x2 x3 . . xn

yi b0  b1 x 1 b0  b1 x 2 b0  b1 x 3 . . b0  b1 x n

! (y i  b0  b1 x i ) 2
i !1

Find the derivative of Z with respect to b0 and b1 and set them equal to zero

Derivatives

Z ! ( y i  b0  b1 x i )
i !1

n xZ ! 2( 1 )( y i  b0  b1 x i ) ! 0 xb0 i ! 1

xZ ! 2(  x i )( y i  b0  b1 x i ) ! 0 xb1 i !1

b0 and b1

xy 
b1 !

( x y )


n 2 ( x ) n

b0 ! y  b1 x

Pizza Restaurant Example


We collect a set of data from random stores of our Pizza restaurant example Restaurant i 1 2 3 4 5 6 7 8 9 10 Student population (1000s) xi 2 6 8 8 12 16 20 20 22 26 Quarterly Sales ($1000s) yi 58 105 88 118 117 137 157 169 149 202

Example
Restaurant i 1 2 3 4 5 6 7 8 9 10 Total Xi 2 6 8 8 12 16 20 20 22 26 140 Yi 58 105 88 118 117 137 157 169 149 20 1300 Xi Yi 116 630 704 944 1404 2192 3140 3380 3278 5252 21040 Xi2 4 36 64 64 144 256 400 400 484 676 2528

b1
( x y ) n ( x )2 n

b1 !

xy 

x2 

(140 )(1300 ) 21040  10 b1 ! (140 )2 2528  10

2840 b1 ! !5 568

b0

Y ! b0  b1 X
1 00 Y ! !1 0 10 1 0 X! !1 10

130 ! b0  5 ( 14 ) b0 ! 60

Estimated Regression Equation

Y ! 60  5 X
Now we can predict. For example, if one of restaurants of this Pizza Chain is close to a campus with 16,000 students. We predict the mean of its quarterly sales is

Y ! 60  5(1 ) Y !1 0 t ous nd doll rs

Summary ; The Simple Linear Regression Model

Simple Linear Regression Model Y = F 0 + F 1X + I Simple Linear Regression Equation E(Y) = F0 + F1X Estimated Simple Linear Regression Equation = b0 + b1X

Summary ; The Least Square Method


Least Squares Criterion min 7(Yi )2 i

where Yi = observed value of the dependent variable for the i th observation i = estimated value of the dependent variable for the i th observation

Summary ; The Least Square Method


Slope for the Estimated Regression Equation

b1 !

XY
i


2 i

( X i Yi ) n ( X i )2 n

Y -Intercept for the Estimated Regression Equation


b0 ! Y  b1 X

Xi = value of independent variable for i th observation Yi = value of dependent variable for i th observation _ X = mean value for independent variable _ Y = mean value for dependent variable n = total number of observations

Coefficient of Determination

Question : How well does the estimated regression line fits the data.

Coefficient of determination is a measure for Goodness of Fit. Goodness of Fit of the estimated regression line to the data. Given an observation with values of Yi and Xi. ^ We put Xi in the equation and get Y i . = b0 + b1Xi i (Yi
i)

is called residual.
i to

It is the error in using SSE = 7 (Yi)2 i

estimate Yi.

SSE : Pictorial Representation


y10 - ^ 10 y

Y = 60+5x

SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202

SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202
i=

60 + 5Xi 70 90 100 100 120 140 160 160 170 190

SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202
i

= 60 + 5xi 70 90 100 100 120 140 160 160 170 190

(Yi - i ) -12 15 -12 18 -3 -3 -3 9 -21 12

(Yi- i )2 144 225 144 324 9 9 9 81 441 144

SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Total Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 118 117 137 157 169 149 202 i = 60 + 5xi 70 90 100 100 120 140 160 160 170 190 (Yi - i ) -12 15 -12 18 -3 -3 -3 9 -21 12 (Yii )2 144 225 144 324 9 9 9 81 441 144

SSE = 1530

SSE = 1530 measures the error in using estimated equation to predict sales

SST Computations
Now suppose we want to estimate sales without using the level of advertising. In other words, we want to estimate Y without using X. If Y does not depend on X, then b1 = 0. y = b0 + b1x ===> b0 = y Therefore Here we do not take x into account, we simply use the average of y as our sales forecast. y = (7 yi) / n y = 1300/10 = 130 This is our estimate for the next value of y. Given an observation with values of yi and xi. (yi y ) is the error in using x to estimate yi. SST = 7 (yi- y )2

SST : Pictorial Representation


y10 - y yi = 130

SST Computations
i 1 2 3 4 5 6 7 8 9 10 Total Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202 (Yi - Y ) -72 -25 -42 -12 -13 7 27 39 19 72 (Yi - Y )2 5184 625 1764 144 169 49 729 1521 361 5184 SST = 15730

SST = 15730 measures the error in using mean of y values to predict sales

SSE , SST and SSR


SST : A measure of how well the observations cluster around y SSE : A measure of how well the observations cluster around If x did not play any role in vale of y then we should SST = SSE If x plays the full role in vale of y then SSE = 0 SST = SSE + SSR SSR : Sum of the squares due to regression SSR is explained portion of SST SSE is unexplained portion of SST

Coefficient of Determination for Goodness of Fit


SSE = SST - SSR The largest value for SSE is SSE = SST SSE = SST =======> SSR = 0 SSR/SST = 0 =====> the worst fit SSR/SST = 1 =====> the best fit

Coefficient of Determination for Pizza example

In the Pizza example, SST = 15730 SSE = 1530 SSR = 15730 - 1530 = 14200 r2 = SSR/SST : Coefficient of Determination 1 u r2 u 0 r2 = 14200/15730 = .9027 In other words, 90% of variations in y can be explained by the regression line.

SST Calculations

SST ! (Y  Y )

SST ! Y 

(Y ) n

SST Calculations
SST ! Y 2  ( Y )2 n

Observation 1 3 4 5 6 7 8 9 10

Xi 6 8 8 1 16 0 0 6

Yi 58 105 88 118 117 137 157 169 149 0 1300

Yi^ 3364 110 5 7744 139 4 13689 18769 4649 8561 01 40804 184730

SST ! 184730  ((1300 )2 / 10 ) ! 15730

SSR Calculations

XY  SSR !

X Y n 2 X X 2  n

Observ

[ 21040  ( 140 )( 1300 ) / 10 ] 2 ! 14200 SSR ! 2 2528  ( 140 ) / 10

SSR Calculations

r !
2 2

T ! 14200 / 15730

r ! .9027

E!

T

E ! 15730  14200 ! 1530

Example : Reed Auto Sales


Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales showing the number of TV ads run and the number of cars sold in each sale are shown below. Number of TV Ads Number of Cars Sold 1 14 3 24 2 18 1 17 3 27

Example : Reed Auto Sales ? XY  ( X Y ) / n A SST ! Y 2  ( Y )2 / n SSR ! X  X / n


2 2 2

We need to calculate X, Y, XY , X2, Y2 X 1 3 2 1 3 10 Y 14 24 18 17 27 100 XY 14 72 36 17 81 220 X2 1 9 4 1 9 24 Y2 196 576 324 289 729 2114

Example : Reed Auto Sales ? XY  ( X Y ) / n A SST ! Y 2  ( Y )2 / n SSR ! X  X / n


2 2 2

x = 10 y = 100 xy = 220 x2 = 24 y2 = 2114


SS SS ! 2114  ( 100 ) / 5 ! 114
2

?220  ( 10 )( 100 ) / 5 A2 ! 2 24  10 / 5 ?220  200 A2 ! 100 !


24  20

Example : Read Auto Sales


Alternatively; we could compute SSE and SST and then find SSR = SST -SSE
hat= 4 4 7 7 Sy SST SSR 3 7 3 4 8 7 Sy 4 -yhat y-yhat

3 S Sy^ [ Sy ^ ] n SST SSE R

4 4 4 SSE

Example : Reed Auto Sales


Coefficient of Determination r 2 = SSR/SST = 100/114 = .88 The regression relationship is very strong since 88% of the variation in number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.

The Correlation Coefficient


Correlation Coefficient = Sign of b1 times Square Root of the Coefficient of Determination)

rxy ! ( sign of b1 ) r 2
Correlation coefficient is a measure of the strength of a linear association between two variables. It has a value between -1 and +1 rxy = +1 : two variables are perfectly related through a line with positive slope. rxy = -1 : two variables are perfectly related through a line with negative slope. rxy = 0 : two variables are not linearly related.

The Correlation Coefficient : example


IN our Pizza example, r2 = .9027 and sign of b1 is positive

rxy ! ( sig of b1 ) r 2 rxy !  .9027 rxy ! .9501


There is a strong positive relationship between x and y.

Correlation Coefficient and Coefficient of Determination


Coefficient of Determination and Correlation Coefficient are both measures of associations between variables. Correlation Coefficient for linear relationship between two variables. Coefficient of Determination for linear and nonlinear relationships between two and more variables.

Exercise
Given the following experimental data on rice yield (t/ha), plant height (cm) and tiller number, determine the relationships of these variables with each other using correlation and regression analysis. Obtain a model relating YIELD to the variables PLTHT and TILLER# and interpret results. Test for the significance of the parameter estimates and the regression equation. Evaluate the adequacy of the model obtained.

SELAMAT BELAJAR

S-ar putea să vă placă și