Sunteți pe pagina 1din 7

Chapter 6: How to Do Forecasting Regression by Analysis

Introduction Regression the study of relationshipsamongvariables,a principal purposeof which is is predict, or estimatethe value of one'Variable to from known or assurned values of other variablesrelatedto it. Variables of Interest: To make predictionsor estimates, must identi8' the effective we predictorsof the variable of interest:which variablesaro important indicators?and can be measured at the least cost? which carrv onlv a little information? and which are redundant? Predicting the Future Predicting a change over time or extrapolating from present conditions to future conditions is not the function of regression analysis. To make estimates the future, use time seriesanalysis. of Experiment: Begin with a hypothesisabout how severalvariables might be relatedto anothervariableand the form of the relationship. Simple Linear Regression:A regression using only one predictor is called a simple regression. Multiple Regressions:Where there are two or more predictors,multiple regressions analysisis employed. Data: Since it is usually unrealistic to obtain information on an entire population, a samplewhich is a subsetof the populationis usually selected. For example,a sample may be either randomly selectedor a researcher may choosethe x-values basedon the capability of the equipmentutilized in the experimentor the experimentdesign.Where usually only limited inferencescan be drawn depending the x-values are pre-selected, upon the particular values chosen.When both x and y are randomly drawn, inferences can generallybe drawn over the rangeof valuesin the sample. Scatter Diagram: A graphicalrepresentation the pairs of data called a scatterdiagram of can be drawn to gain an overall view of the problem. Is there an apparentrelationship? Direct? Inverse?If the points lie within a band describedby parallel lines, we can say there is a linear relationshipbetweenthe pair of x and y values. If the rate of changeis generallynot constant,then the relationshipis curvilinear. The Model: If we have determinedthere is a linear relationshipbetweent and y we want a linear equationstating y as a function of x in the form Y: a * bx + e where a is the intercept,b is the slope and e is the error term accountingfor variablesthat affect y but are not includedas predictors,and/orotherwiseunpredictable and uncontrollablefactors.

Associate ProfessorDr Abdul Talib Bon BP83l63 : Forecastingand Simulation

Lcast-Squares Method: To predictthe meany-valuefor a given x-value,we needa line which passes through the meanvalue of both x and y and which minimizes the sum of the distancebetween each of the points arid the predictive line. Such an approachshould result in a line which we can call a "bestfit" to the sampledata.The least-squares method achieves this result by calculatingthe minimum averagesquareddeviationsbetweenthe sampley points and the estimatedline. A procedureis used for finding the values of a and b which reducesto the solution of simultaneous linear equations.Shortcutformulas havebeendevelopedas an alternative the solutionof simultaneous to equations. Solution Methods: Techniques Matrix Algebra can be manually employedto solve of simultaneous linear equations.When performing manual computations, this techniqueis especially useful when there are more than two equations and two unknowns. Several well-known computer packagesare widely available and can be utilized to relieve the user of the computationalproblem, all of which can be used to solve both linear and polynomial equations:the BMD packages(Biomedical Computer Programs) from UCLA; SPSS (Statistical Package for the Social Sciences) developed by the University of Chicago; and SAS (StatisticalAnalysis System).Another packagethat is also available is IMSL, the InternationalMathematicaland StatisticalLibraries, which and statisticalcalculations. containsa great variety of standard mathematical All of these use equations. softwarepackages matrix algebrato solve simultaneous can be Use and Interpretation of the RegressionEquation: The equationdeveloped usedto predict an averagevalue over the range of the sampledata. The forecastis good for shortto medium ranges. Measuring Error in Estimation: The scatteror variability about the mean value can be measured calculatingthe variance,the averagesquareddeviation of the valuesaround by the mean.The standarderror of estimateis derived from this value by taking the square root. This value is interpretedas the averageamount that actual values differ from the cstimated mean. Confidence Interval: Interval estimatescan be calculatedto obtain a measureof the confidencewe have in our estimatesthat a relationship exists. These calculationsare bands, we madeusing t-distribution tables.From thesecalculations can derive conf,tdence a pair of non-parallellines narrowestat the meanvalueswhich expressour confidencein the equation. of varying degrees the band of valuessurrounding regression Assessment:How confident can we be that a relationshipactually exists?The strengthof by that relationshipcan be assessed statisticaltests of that hypothesis,such as the null hypothesis,which are establishedusing t-distribution, R-squared, and F-distribution tables.Thesecalculationsgive rise to the standarderror ofthe regressioncoefficient,an coefficient b will vary from sampleto sample estimateof the amount that the regression of the samesize from the samepopulation.An Analysis of Variance(ANOVA) table can the of which summarizes different components variation. be generated

ProfessorDr Abdul Talib Bon Associate BPB3l63 ; Forecastinsand Simulation

When you want to comparemodels of different size (different numbers of independent variablesand/or different sample sizes)you must use the Adjusted R-Squared,because tendsto grow with the number of independent variables. the usualR-Squared The Standard Error of Estimate,i.e. squareroot of effor mean square,is a good indicator of the "quality" of a predictionmodel sinceit "adjusts"the Mean Error Sum of Squares (MESS)for the numberof predictors the model as follow: in MESS : Eror Sum of Squares/(N Number of Linearly Independent Predictors) predictors a model, the MESS will becomeless and less If one keepsaddinguseless to stable.R-squared also influencedby the range of your dependent is value; so, if two modelshave the sameresidualmean squarebut one model has a much narrowerrangeof valuesfor the dependent variable that model will have a higher R-squared.This explains the fact that both models will do as well for predictionpurposes. Analysis with Diagnostic Tools JavaScriptto check You may like using the Regression your computations, and to perform some numerical experimentation for a deeper understanding theseconcepts. of

A,ssociate ProfessorDr Abdul Talib Bon BP83l63 : Forecastingand Simulation

Predictions by Regression -fhe regression What analysishas three goals:predicting,modeling, and characterization. would be the logical order in which to tackle thesethree goals such that one task leadsto and /or and justifies the other tasks?Clearly, it dependson what the prime objective is. you wish to model in order to get betterprediction.Then the order is obvious. Sometimes you just want to understand and explain what is going on. Then modeling is Sometimes, predicting may be used to test any model. Often again the key, though out-of-sample modeling and predicting proceedin an iterative way and there is no 'logical order' in the broadestsense.You may model to get predictions, which enable better control, but to iteration is again likely to be present and there are sometimesspecial approaches controlproblems. The following contains the main essential steps during modeling and analysis of in regression model building, presented the contextof an appliednumericalexample. Formulasand Notations: . :;= Xx/n This is just the meanof the x values.
. 1': Lvln

This is just the meanof the y values.^ . S**: SS**: x(x(i) - 9': xx' - ( 2x)' I n : : . Syy SSvv t(y(i) - !)' : >y2 - (Zy)2 I n . Sxv 5Sxv: X(x(i) - xXV(i)- F): Xx .y - (tx) . Qy) / n . Sl opem:S S "y/S S ** . lntercept. 1,r- . ;.; [: 11 . y-predicted:yhat(i): m'x(i) + b. :Error(i) : y-yhat(i). . Residual(i) : . SSE: Sr.,: SSr.,: SS.rror, I[V(i) - yhat(i)]'. : . . Standard deviationof residuals: S: Sr.,: S.rror, [SSr",I (n-2)ltt2 . errorof the slope(m): Sr.,/ SS**t". Standard t". . error of the intercept(b) : S,.'[(SS*** n. :?1 l1n. SS**] Standard

Associate ProfessorDr Abdul Talib Bon BPB3163: Forecastinsand Simulation

An Application: A taxicab companymanagerbelievesthat the monthly repair costs(Y) of cabsare relatedto age (X) of the cabs.Five cabsare selected randomly and from their : { (2 , 2 ), (3 , 5 ), (4 , 7 ), (5 , 1 0 ), (6 , 1 1 )} . we the r eco r d s obtained followingd a t a :(x , y ) Basedon our practicalknowledgeand the scattered diagramof the data,we hypothesize a predictorX, andthe costY. linearrelationship between Now the questionis how we can best (i.e., least square) use the sampleinforrnation to estimatethe unknown slope (m) and the intercept(b)? The first step in finding the least squareline is to construct a sum of squarestable to find the sums of x values (Ix), y values(Iy), the squares the x values(>*'), the squares the x values(Iyt), and the of of of x cross-product the corresponding and y values(Ixy), as shownin the following table:

v
z
a J

x2 4 9 I6 25 36 90

xy 4 15 28 50 66 163

v
4 25 49 100
141
IL I

4 5 6 SUM 20

2 5 7 10 i1 35

299

the The secondstep is to substitute valuesof Xx, 2y,Lxz,Ixy, and Xy2 into the following formulas: : 5S*v: Ixy - (Ix)(>y)/n : 163- (20X35)/5 163- 740:23 SS* * : r x2 - (rx)2/n: 90 - (2q215: 9 0 - 8 0 : 1 0 SSyy: Zy' - (Zy)2ln:299 - 245: 54 slope: Use the first two valuesto computethe estimated : / Slope m : SS"y SS**: 23 I l0 : 2.3 To estimatethe interceptof the least squareline, use the fact that the graph of the least squareline alwayspassthrough (:{, y) point, therefore, : : 5 The intercept b : r- (m)(x) : (Iy)/ 5 - (2.3) (Ix/5) : 351 - (2.3)(2015) -2.2 Thereforethe leastsquareline is: : y-predicted yhat: mx + b: -2.2+ 2.3x.

Associate ProfessorDr Abdul Talib Bon BPB3l63 : Forecastingand Simulation

After estimatingthe slope and the interceptthe questionis how we determinestatistically if the modelis good enough, The standard errorof slopeis: say for prediction. errorof the slope(m): Sn,: S,.,/ S**t", Standard and its relativeprecisionis measured statistic by
tslo pe : m / S, r 1.

example,it is: For our numerical : (10r/2)l 12.01 tsl op e2 .3I l(0.6055)/ : which is largeenough, indicationthat the fitted modelis a "good" one. You may ask, in what senseis the least squaresline the "best-fitting" straight line to 5 criterion chooses line that minimizes the sum of square the datapoints. The leastsquares : error: y - yhat: verticaldeviations, residual i.e., : SSE: f (y - yhat)2 I(error)2: 1.1 The numericalvalue of SSE is obtainedfrom the following computationaltable for our example. numerical y error squared x -2.2+2.3x y-predicted y Predictor observed errors

2
-) 4 5 6

2 .4 4 .7 7 9 .3 11 .6

2 5 7
10 11

- 0.4 0.3 0.7 - 0.6 Sum=0

0.16 0.09 0.49 0.36 Sum=l.l

Alternately,one may computeSSEby: -m S S *,: 54 - (2.3)(23:) 5 4 - 5 2 . 9: 1 . 7 , SSE: SS ru as expected Notice that this value of SSE agreeswith the value directly computed from the above of table.The numericalvalue of SSE givesthe estimate variation of the errorss':

st: ssEI (n -2): 1.rI (5 - 2) : 0.36667

Associate Pro/bssorDr Abdul Talib Bon BPB3l63 : Forecastingand Simulation

The estimatethe value of the error varianceis a measureof variability of the y values deviation aboutthe estimatedline. Clearly, we could also computethe estimatedstandard by rootsof the variance s2. s of the residuals taking the square As the last step in the model building, the following Analysis of Variance (ANOVA) the tableis thenconstructed assess ov,erall to usins the F-statistics: soodness-of-fit Analysis of Variance Components

Source

DF

Sum Squares

of Mean Square

F Value

Prob > F

Model Error Total

| 3 4

5 2 .9 0 0 0 0 S S E : f .i S S rr: 5 4

52.90000 0.36667

144.273

0.0012

proposes, fit is considered the acceptable the F-statistic more than f-iveif For practical is times the F-value from the F distribution tables at the back of your textbook. Note that, the criterion that the F-statistic must be more than five-times the F-value from the li of distribution tablesis independent the samplesize. Notice also that there is a relationshipbetweenthe two statisticsthat assess quality of the the fitted line, namely the T-statisticsof the slope and the F-statisticsin the ANOVA table.The relationshipis:
t2rlnn. = F

example. This relationship be verifiedfbr our computational can

Associate ProfessorDr Abdul Talib Bon IIPB3163 : Forecastingand Simulation

S-ar putea să vă placă și