Documente Academic
Documente Profesional
Documente Cultură
Budi Waluyo
Titik-titik merah adalah nilai hasil eksperimen, di-notasikan Yi , yang diduga membentuk garis lurus Garis inilah model yang akan ditaksir, dengan cara menaksir koefisiennya, yaitu b0 dan b1, sehingga terbentuk persamaan b0 + b1 Xi. Garis tegak lurus sumbu horisontal yang menghubungkan titik eksperimen dengan garis lurus dugaan dinamai error.
1 b0
b1
yi b0 b1 x1 b0 b1 x2 b0 b1 x3 . . b0 b1 xn
Min
2 Z ! (y i y i )
i !1
! (y i b0 b1 x i ) 2
i !1
Classic Minimization
Min Z ! (y i b0 b1 x i ) 2
i !1 n
We want to minimize this function with respect to b0 and b1 This is a classic optimization problem. We may remember from high school algebra that to find the minimum value we should get the derivative and set it equal to zero.
xi x1 x2 x3 . . xn
yi b0 b1 x 1 b0 b1 x 2 b0 b1 x 3 . . b0 b1 x n
! (y i b0 b1 x i ) 2
i !1
Find the derivative of Z with respect to b0 and b1 and set them equal to zero
Derivatives
Z ! ( y i b0 b1 x i )
i !1
n xZ ! 2( 1 )( y i b0 b1 x i ) ! 0 xb0 i ! 1
xZ ! 2( x i )( y i b0 b1 x i ) ! 0 xb1 i !1
b0 and b1
xy
b1 !
( x y )
n 2 ( x ) n
b0 ! y b1 x
Example
Restaurant i 1 2 3 4 5 6 7 8 9 10 Total Xi 2 6 8 8 12 16 20 20 22 26 140 Yi 58 105 88 118 117 137 157 169 149 20 1300 Xi Yi 116 630 704 944 1404 2192 3140 3380 3278 5252 21040 Xi2 4 36 64 64 144 256 400 400 484 676 2528
b1
( x y ) n ( x )2 n
b1 !
xy
x2
2840 b1 ! !5 568
b0
Y ! b0 b1 X
1 00 Y ! !1 0 10 1 0 X! !1 10
130 ! b0 5 ( 14 ) b0 ! 60
Y ! 60 5 X
Now we can predict. For example, if one of restaurants of this Pizza Chain is close to a campus with 16,000 students. We predict the mean of its quarterly sales is
Simple Linear Regression Model Y = F 0 + F 1X + I Simple Linear Regression Equation E(Y) = F0 + F1X Estimated Simple Linear Regression Equation = b0 + b1X
where Yi = observed value of the dependent variable for the i th observation i = estimated value of the dependent variable for the i th observation
b1 !
XY
i
2 i
( X i Yi ) n ( X i )2 n
Xi = value of independent variable for i th observation Yi = value of dependent variable for i th observation _ X = mean value for independent variable _ Y = mean value for dependent variable n = total number of observations
Coefficient of Determination
Question : How well does the estimated regression line fits the data.
Coefficient of determination is a measure for Goodness of Fit. Goodness of Fit of the estimated regression line to the data. Given an observation with values of Yi and Xi. ^ We put Xi in the equation and get Y i . = b0 + b1Xi i (Yi
i)
is called residual.
i to
estimate Yi.
Y = 60+5x
SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202
SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202
i=
SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202
i
SSE Computations
i 1 2 3 4 5 6 7 8 9 10 Total Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 118 117 137 157 169 149 202 i = 60 + 5xi 70 90 100 100 120 140 160 160 170 190 (Yi - i ) -12 15 -12 18 -3 -3 -3 9 -21 12 (Yii )2 144 225 144 324 9 9 9 81 441 144
SSE = 1530
SSE = 1530 measures the error in using estimated equation to predict sales
SST Computations
Now suppose we want to estimate sales without using the level of advertising. In other words, we want to estimate Y without using X. If Y does not depend on X, then b1 = 0. y = b0 + b1x ===> b0 = y Therefore Here we do not take x into account, we simply use the average of y as our sales forecast. y = (7 yi) / n y = 1300/10 = 130 This is our estimate for the next value of y. Given an observation with values of yi and xi. (yi y ) is the error in using x to estimate yi. SST = 7 (yi- y )2
SST Computations
i 1 2 3 4 5 6 7 8 9 10 Total Xi 2 6 8 8 12 16 20 20 22 26 Yi 58 105 88 188 117 137 157 169 149 202 (Yi - Y ) -72 -25 -42 -12 -13 7 27 39 19 72 (Yi - Y )2 5184 625 1764 144 169 49 729 1521 361 5184 SST = 15730
SST = 15730 measures the error in using mean of y values to predict sales
In the Pizza example, SST = 15730 SSE = 1530 SSR = 15730 - 1530 = 14200 r2 = SSR/SST : Coefficient of Determination 1 u r2 u 0 r2 = 14200/15730 = .9027 In other words, 90% of variations in y can be explained by the regression line.
SST Calculations
SST ! (Y Y )
SST ! Y
(Y ) n
SST Calculations
SST ! Y 2 ( Y )2 n
Observation 1 3 4 5 6 7 8 9 10
Xi 6 8 8 1 16 0 0 6
Yi^ 3364 110 5 7744 139 4 13689 18769 4649 8561 01 40804 184730
SSR Calculations
XY SSR !
X Y n 2 X X 2 n
Observ
SSR Calculations
r !
2 2
T ! 14200 / 15730
r ! .9027
E!
T
We need to calculate X, Y, XY , X2, Y2 X 1 3 2 1 3 10 Y 14 24 18 17 27 100 XY 14 72 36 17 81 220 X2 1 9 4 1 9 24 Y2 196 576 324 289 729 2114
4 4 4 SSE
rxy ! ( sign of b1 ) r 2
Correlation coefficient is a measure of the strength of a linear association between two variables. It has a value between -1 and +1 rxy = +1 : two variables are perfectly related through a line with positive slope. rxy = -1 : two variables are perfectly related through a line with negative slope. rxy = 0 : two variables are not linearly related.
Exercise
Given the following experimental data on rice yield (t/ha), plant height (cm) and tiller number, determine the relationships of these variables with each other using correlation and regression analysis. Obtain a model relating YIELD to the variables PLTHT and TILLER# and interpret results. Test for the significance of the parameter estimates and the regression equation. Evaluate the adequacy of the model obtained.
SELAMAT BELAJAR