Sunteți pe pagina 1din 11

Fall 2006 IENG 314 Project

December 8th 2006

Presented To:
Dr Wafik Iskander

Presented By:
Clay Holmes
Scott Maston
Tucker Whittaker
IENG 314 Project

The management of Disney World in Orlando, Florida is interested in estimating


the daily income from the theme parks during any single day during the month of June.
Data were collected for the last 12 years and adjusted to today’s dollars. Data were
collected for the following variables:

Y = Daily Gross Income


X1 = Year of Data Collection
X2 = Day of the Week
X3 = Highest Temperature Recorded the Day Before
X4 = Highest Temp. Forecasted for the Day
X5 = Chance of Precipitation
X6 = Relative Humidity
X7 = Volume of Gas Sold in the Previous Day in Gas Stations within 10 Mile
Radius
X8 = Total Number of Persons Arriving at Orlando Airport in Previous 3 Days

The first step of the project was to clean the data to get rid of all the outliers. This
will help give a more accurate estimate of the model. To do so we plotted each variable
against Y and deleted the outlying observations in the data set.

Before Data Clean Up of X1

Y vs X1

10000

8000

6000
Series1
Y

4000

2000

0
1990 1995 2000 2005 2010
X1

Removed
Data
9045 1997 2 86 85 97 59 2.64 327.7
3525 2000 3 85 83 39 31 4.1 244.6
After Data Cleanup of X1

Y vs X1

600
500
400
300 Series1
Y

200
100
0
1990 1995 2000 2005 2010
X1

Y vs X2

600
500
400
300 Series1
Y

200
100
0
0 1 2 3 4
X2
Y vs X3

600
500
400
300 Series1
Y

200
100
0
60 70 80 90 100
X3

Y vs X4

600
500
400
300 Series1
Y

200
100
0
60 70 80 90 100
X4

Before Data Cleanup of X5

Y vs X5

600
500
400
300 Series1
Y

200
100
0
60 110 160 210
X5
Data Removed
346.7 1999 3 80 81 197 55 3.42 367.3

After Cleanup of Data X5

Y vs X5

600
500
400
300 Series1
Y

200
100
0
60 70 80 90 100 110
X5

Y vs X6

600
500
400
300 Series1
Y

200
100
0
0 20 40 60 80 100
X6
Before Cleanup of X7

Y vs X7

600
500
400
300 Series1
Y

200
100
0
0 20 40 60 80 100
X7

Data
Removed
454.2 2003 3 72 67 100 71 41.7 395.3
389.8 2006 3 82 79 69 41 39.3 284.8
After Data Cleanup of X7

Y vs X7

600
500
400
300 Series1
Y

200
100
0
0 2 4 6 8 10
X7
Before Data Cleanup of X8

Y vs X8

600
500
400
300 Series1
Y

200
100
0
0 200 400 600 800 1000 1200
X8

Data Removed
379.4 2002 3 80 80 96 55 3.5 981.7

After Data Cleanup of X8

Y vs X8

600
500
400
300 Series1
Y

200
100
0
0 100 200 300 400 500
X8

After the data cleanup the total number of observations was n = 354. 200
observations were used for model development and 154 were used for the validation
portion. We then broke down X3 into an Xsat and Xsun, knowing that if it wasn’t either
of those days it had to be a day of the week. The variables were then transformed by
taking the inverse and natural log, and squaring each variable to see if they helped
contribute to the model, and lastly used the stepwise function of SAS find the best
variables to use in the model. The best eight variable model is as followed:
8 0.8154 Xsat Xsun Xtempforc Xgas Xairport SQRTXtempforc INVXtempforc INVXrain
The Data now needs to be cleaned once again using DFFITS. Based on the rule of
thumb that any |DFFITS|> 2*[sqrt(p)/n] will be an outlier, we determined that any |
DFFITS|> .424 will be removed. The following points were removed:
20 0.0112 0.0478 | | | 786.907 0.0477 1.0000 3249310 83.9355
50 17.664 0.924 | |* | 0.023 0.9236 0.1946 1.2503 0.4540
57 19.232 -2.112 | ****| | 0.024 -2.1315 0.0453 0.8877 -0.4643
64 19.271 2.586 | |***** | 0.032 2.6252 0.0414 0.7937 0.5459
76 19.265 2.565 | |***** | 0.032 2.6040 0.0420 0.7982 0.5453
146 18.981 -2.690 | *****| | 0.061 -2.7356 0.0700 0.7963 -0.7506
178 15.602 1.792 | |*** | 0.211 1.8023 0.3717 1.4325 1.3863
191 19.217 -2.390 | ****| | 0.031 -2.4203 0.0468 0.8369 -0.5364
194 18.469 -2.593 | *****| | 0.101 -2.6329 0.1195 0.8625 -0.9701
Actual Data Removed
20 401.9 1996 0 0 1 83 10 2 13 3.14 312.9
50 326.3 2004 0 0 1 92 95 76 49 3.37 278.5
57 366.9 2001 0 0 1 76 74 48 36 4 218.2
64 476 2001 0 1 0 83 84 69 47 3.71 345
76 485.9 1995 0 1 0 78 80 44 24 3.68 265.1
146 409.2 1997 0 1 0 74 74 84 48 3.58 208.3
178 519.3 2002 1 0 0 83 85 1 14 3.15 315.9

191 386.3 1997 1 0 0 85 83 58 49 3.11 350.7


194 375.8 1995 0 1 0 85 87 2 15 2.91 270.8

The Data was then cleaned once again using DFFITS. This time, any |DFFITS|
>.434 was removed. The following points were removed:
100 17.425 -1.955 | ***| | 0.029 -1.9705 0.0632 0.9267 -0.5118
111 17.308 -1.606 | ***| | 0.023 -1.6135 0.0757 0.9998 -0.4618
112 17.695 -2.650 | *****| | 0.027 -2.6955 0.0340 0.7633 -0.5053
115 17.702 2.581 | |***** | 0.025 2.6220 0.0331 0.7771 0.4852
120 16.517 -1.205 | **| | 0.030 -1.2067 0.1583 1.1616 -0.5233
136 17.441 1.754 | |*** | 0.022 1.7643 0.0615 0.9604 0.4517
186 17.507 2.066 | |**** | 0.027 2.0845 0.0543 0.8975 0.4994
188 16.686 -1.141 | **| | 0.024 -1.1421 0.1409 1.1467 -0.4626

ACTUAL DATA
100 411.2 1996 1 0 0 82 84 6 10 3.42 358.1

115
111 396.1
364.2 1995
1997 0
1 00 10 83
86 86
88 26
98 2351 3.95
2.69 231.1
235.9
120
112 403.6
307.4 2005
2002 0
0 00 11 73
86 71
84 21
35 18
29 3.17
3.87 298.5
208.1
136 472.4 2004 0 1 0 83 83 30 19 4.14 360.1
186 451.4 1998 1 0 0 85 88 11 12 3.2 313.6
188 341.1 1999 0 0 1 84 87 2 15 2.95 279.6

The Data then needed to be cleaned once again using DFFITS. Any |DFFITS|
>.443 was removed. The following points were removed:
32 9.080 0.403 | | | 0.044 0.4017 0.7102 3.6041 0.6289
99 16.359 -1.811 | ***| | 0.023 -1.8231 0.0593 0.9434 -0.4579
172 10.918 0.675 | |* | 0.070 0.6739 0.5810 2.4550 0.7935

ACTUAL DATA

385.5 2005 0 0 1 93 88 1 20 2.54 206.4


389.5 1999 1 0 0 86 86 19 24 3.37 236.2
319.8 1999 0 0 1 92 95 85 56 3.69 325.5

The Data then needed to be cleaned once again using DFFITS. Any |DFFITS|
>.447 was removed. The following points were removed:
26 16.564 -2.572 | *****| | 0.023 -2.6160 0.0300 0.7619 -0.4603
44 16.325 -2.556 | *****| | 0.045 -2.5990 0.0578 0.7880 -0.6440
52 15.532 1.273 | |** | 0.031 1.2756 0.1471 1.1345 0.5298
61 15.901 1.929 | |*** | 0.049 1.9445 0.1061 0.9675 0.6699
150 15.115 0.926 | |* | 0.023 0.9257 0.1923 1.2475 0.4517
155 14.994 -0.916 | *| | 0.024 -0.9160 0.2052 1.2689 -0.4654

ACTUAL DATA
321.9 2005 0 0 1 84 85 33 20 4.01 334.4
323.7 1999 0 0 1 83 82 6 17 3.36 247.3
422.6 2000 0 0 1 80 78 4 18 3.75 317.9
354.4 1997 0 0 1 86 90 11 12 3.12 299.9
349.6 2000 0 0 1 88 91 91 61 3.96 367.8
294.8 2002 0 0 1 91 91 20 24 3.18 200.7

The Data then needed to be cleaned once again using DFFITS. Any |DFFITS|
>.454 was removed. The following points were removed:
23 13.594 0.732 | |* | 0.024 0.7307 0.2869 1.4385 0.4634
36 15.398 -1.577 | ***| | 0.026 -1.5844 0.0851 1.0070 -0.4833
94 15.433 -1.547 | ***| | 0.023 -1.5535 0.0809 1.0076 -0.4609
134 14.455 0.967 | |* | 0.025 0.9666 0.1938 1.2448 0.4739

ACTUAL DATA
381.7 2000 0 0 1 84 84 3 13 3.01 388.5
383.8 2004 1 0 0 86 86 53 35 2.87 212.5
379.8 1999 0 1 0 86 88 79 53 3.69 332.4
429.1 2000 1 0 0 89 90 44 27 4.03 328.3

The Data then needed to be cleaned once again using DFFITS. Any |DFFITS|
>.460 was removed. The following points were removed:
5 15.145 -1.376 | **| | 0.024 -1.3798 0.1030 1.0600 -0.4675
104 14.903 1.273 | |** | 0.027 1.2759 0.1314 1.1116 0.4962
ACTUAL DATA

355.8 2005 0 0 1 80 81 6 12 3.83 230.5


364.7 1995 0 0 1 86 89 91 62 4.06 385.2
The Data then needed to be cleaned once again using DFFITS. Any |DFFITS|>.463 was
removed. The following points were removed:
36 15.454 -1.904 | ***| | 0.024 -1.9202 0.0567 0.9117 -0.4707
163 13.887 1.064 | |** | 0.039 1.0642 0.2383 1.3030 0.5952
ACTUAL DATA
303 2006 0 0 1 83 84 82 58 2.53 224.5
321.6 1999 0 0 1 88 90 54 35 2.71 383.7

The model was then tested again using DFFITS, but no points outside of the |
DFFITS|>.466 were found, so we proceeded in our model development.
To find the best possible model with the desired number of variables, Proc
Rsquared was ran to eliminate some of the variables down to a 5~6 Variable Model. We
came up with the following results:
5 Variable Model:
5 0.8561 Xsat Xsun Xtempforc Xgas INVXrain

6 Variable Model:
6 0.8618 Xsat Xsun Xtempforc Xgas Xairport INVXrain

Using the 6 Variable Model, Parameter estimates are:

Intercept 680.6614
Xsat 80.56435
Xsun 62.8493
Xtempforc -4.91733
Xgas 19.44379
Xairport 0.058996
INVXrain 71.65425

For our model to be accurate, MSPR should be close to MSE. The comparison
found in our model was MSPR=475 > MSE=250, showing that our model was not
completely accurate. We then removed data based on the Hat-Diagonal Element <
Hii=.1038, which was computed using the formula (2p/n). The following is the data that
was removed:

37 14.174 0.190 | | | 0.002 0.1891 0.2291 1.3537 0.1031


89 15.154 -0.580 | *| | 0.006 -0.5787 0.1188 1.1687 -0.2125

Actual Data
411 0 0 1 75 78 70 45 3.04 226.5
345.7 0 0 1 77 80 78 50 3.21 248.3

334.7 0 0 77 82 0.012195 2.95 399.3 383.82 2412.47


428.4 0 0 77 76 0.013158 3.15 340.4 384.30 1944.83
453.2 0 1 89 90 0.011111 3.9 257.3 397.67 3083.05
299.5 0 0 85 36 0.027778 3.84 201.5 351.23 2676.04
Our new MSPR comparison was MSPR=455 > MSE=250, showing that our
model was still not completely accurate. Data was again removed on the basis of Hat-Dia
Element < Hii=.1052 using the same formula. The following data was removed:
DATA
1 32.276 0.258 | | | 0.002 0.2561 0.1481 1.2913 0.1068
3 31.818 2.233 | |**** | 0.148 2.3009 0.1720 0.7915 1.0488

The new MSPR comparison was MSPR=462 <MSE=470, showing that our model
was accurate enough to be able to predict a new value of Y with a 90% Confidence
Interval, which is shown below:

MSE 470.91062
Xh-1(X'X)-1Xh 0.020868815
S*^2 480.7379667
y-Hat 321.0516941

90% CI :
284.7865242 <y-hat< 357.3169

Based on the given results, a 90% Confidence Interval can be obtained to find the
Daily Gross Income for the following conditions:

Day of the Week: Wednesday


Highest Temp. Recorded the Day Before: 83 degrees
Highest Temp. Forecasted for the day: 81 degrees
Chance of Precipitation: 50%
Relative Humidity: 77%
Volume of Gas Sold Previous Day, Stations within 10 Mile Radius: 2.85 Mil.
Gallons
Total Number of Persons Arriving at Orlando Airport Previous 3 Days: 253.7
(1000 persons)

For these conditions, it can be expected, with a 90% confidence, that the daily
gross income will be between 284.787 ($1,000) and 357.317 ($1,000).

S-ar putea să vă placă și