Sunteți pe pagina 1din 8

MULTIPLE REGRESSION – PREDICTION

MODEL FOR APARTMENT SELLING PRICE


IN DOWNTOWN MONTREAL AREA

MBA 608 – STATISTICAL MODELS


For Business Decisions

Submitted to: Dr. D. Morin

Submitted by:

Tarek Jalloul 919 4444

Suiqing Zhang 6629857

SelvaKumar Raju 5620937


1. Introduction
The price of Apartments for sale in downtown Montreal is dependant variable that depends on
many various different variables. In this project, we will try to discover the relations between the
asking price for apartments in downtown Montreal and various other variables to be described later
on. The area used to collect the data is the main Montreal downtown area, which are the
neighborhoods surrounding Mount Royal, mainly south-side. The numbers shown are at the time
of collection, which is early October 2012, and the data consists of both numerical and categorical
variables (8 Numerical and 1 Categorical). It should be noted here that the targeted area of the data
collection was the Ville-Marie area in the heart of downtown Montreal, which constitute 71% of
the available data, but also, the data from the different areas collected was in proximity to the target
area so this variable was dropped back in the proposal.

2. Multiple Linear Regressions


This project will perform a complete multiple linear regression analysis with software SPSS to
predict an apartment asking price in the Montreal downtown area, based on certain variables to be
discussed later. It can also be used to come up with a reasonable asking price. Whereas realtors
use experience and local knowledge to subjectively value an apartment based on its characteristics,
this project would predict the sale price of an apartment in the same manner but based on study of
relationships between the different data, or even explore the relative values of different apartment
characteristics such as, "Is an additional bathroom valued more than an additional bedroom?"

The variables used are defined as follows:

 Asking Price: Continuous numerical variable which specifies the demanded price of the
seller, expressed in Canadian Dollars ($CAN). This is the main variable to be predicted in
the regression process.
 Living Space: Continuous numerical variable which specifies the total square-foot area of
the apartment offered for sale.
 Year Built: Discrete numerical variable which specifies the year in which the building was
constructed in which the apartment offered is located in.
 Number of Bedrooms: Discrete numerical variable which specifies the total number of
bedrooms included.
 Number of Bathrooms: Discrete numerical variable which specifies the total number of
bathrooms included.
 Number of Garages: Discrete numerical variable which specifies the total number of
garages (or indoors parking spaces) included.
 Story (level): Discrete numerical variable which specifies the story in which the apartment
offered is located.
 Extra Property Charges: Continuous numerical variable. Includes all the extra annual fees
that accompany the offered apartment. This includes School and Municipal Taxes, Co-
Ownership Fees, Maintenance and any other included annual charges or fees. They mainly
reflect the luxury level of the apartment in a manner, saying more luxurious apartments
have higher annual fees of some sorts.

We have seen the boxplot graphs for these data in the proposal, which included many useful
variable characteristics.

In the model to be constructed with this data set, we aim to finally predict the asking price of a
certain apartment in the downtown Montreal area with certain specification relating to the variables
discussed above. The studies that are to be performed on the data set will tell us more clearly which
variables are more significant in setting the Asking Price, in order for us to know what variables
are most important in order to try and predict the asking price for a certain apartment.

The main characteristics of the variables are recollected in a quick glimpse here, after which a
multiple regression analysis will be performed on different stages as we take out the insignificant
variables.

Descriptive Statistics
Mean Std. Deviation N
Price 696624.72 630259.805 101
Living space (X1) 1309.72 624.895 101
Years (X2) 35.10 34.231 101
No of bedroom (X3) 2.00 .949 101
No of bathroom (X4) 1.63 .703 101
Garage (X5) 1.08 .611 101
Story (X6) 4.87 5.528 101
Extra exp. (X7) 10321.50 8593.757 101

Variable Living Year Bedrooms Bathrooms Garages Story Extra


Space Built Charges
Correlation 0.7773 0.0861 0.4422 0.6629 0.5626 0.7658 0.8786
(with Price)

Multiple regressions involve the use of more than one independent variable to predict a dependent
variable.

Equation for Multiple Regression estimated using the Least-Squares method:

Y = b0 + b1*X1 + b2*X2 + ... + bp*Xp, where X1-Xp are Independent variables

The 7 independent variables will be used for regression. The results of each regression will be
shown in tables and discussed thereafter. Various guidelines are used throughout the process of
monitoring and elimination, such as eliminating any variable with a t-score < |2| in the regression
results and monitoring the r-squared adjusted as variables are removed.
2.1 Initial Regression Model
Price = -305880.270 + 365.588(X1) - 482.924(X2) - 9069.079(X3) + 29913.991(X4) +
28647.975(X5) + 33963.153(X6) + 30.378(X7)

The following four variables (years, #bedroom, #bathroom, garage) are deemed to be removed.
They are removed one by one when doing the regression. We notice that the result of the
regression match so far with the initial theories when relating the correlation to the level of
importance of each variable. It is noticed that only the variables with the highest correlation with
respect to the price are deemed to be significant to the model (Living space, Story, Extra
Expenses).

2.2 Regression with OUTLIERS and 3 variables:


Only the final result of the regression model with the outliers after all the insignificant variables
have been removed will be showed although the insignificant variables have been removed one by
one.

Model Summary
Std. Error
Adjusted
Model R R Square of the
R Square
Estimate
.945 0.893 0.889 209624.974

ANOVA
Sum of Mean
Model df F Sig.
Squares Square
Regression 3.55E+13 3 1.18E+13 268.989 .000
Residual 4.26E+12 97 4.39E+10
Total 3.97E+13 100

Coefficients
Standardized Collinearity
Model Coefficients t Sig. Statistics
Beta Tolerance VIF
(Constant) -5.84 .000
Living
0.365 8.015 .000 0.534 1.874
space
story 0.315 7.392 .000 0.608 1.646
extraexp 0.434 8.04 .000 0.379 2.637

These initial results are not considered successful due to many facts, one of which is the extreme
level of error existing. It is therefore decided that a better model would be constructed if the outliers
are removed. The outliers have been identified for each variable when describing the data in the
proposal, but it is only logical to remove the outliers of the Asking Price variable since it is what
is trying to be predicted. Most importantly, it is also noticeable that removing these outliers from
the Asking Price will consequently remove the main majority of the outliers from all of the other
variables.

Descriptive Statistics
Std. Correlation
Mean N
Deviation to Price
Price 516286.8 235573 91 -
livingspace 1191.16 504.599 91 0.712
years 37.07 35.08 91 0.217
nobedroom 1.9 0.87 91 0.613
nobathroom 1.53 0.603 91 0.689
garage 0.99 0.527 91 0.504
story 3.48 3.1 91 -0.009
extraexp 8208.49 5094.565 91 0.689

2.3 Regression WITHOUT OUTLIERS and ALL VARIABLES


Model Summary
Std. Error
R Adjusted R
Model R of the
Square Square
Estimate
.870 0.757 0.736 120996.488
Change Statistics
Model R Square F Sig. F
df1 df2
Change Change Change
0.757 36.879 7 83 .000

ANOVA
Model Sum of Squares df Mean Square F Sig.
Regression 3.7794E+12 7 5.3991E+11 36.879 .000
Residual 1.2151E+12 83 1.464E+10
Total 4.9945E+12 90

The Years, #Bedrooms, Garage and Story variables seem will be removed in total due to their low
t-scores (-.848, 1.576 and .675) and high significance values (.399, .119 and .501) in this new
model without the outliers. We will now show the Initial model Residual statistics to compare with
the final model after removing the outliers and the insignificant variables.
ALL VARIABLES NO OUTLIERS - Residuals Statistics
Minimum Maximum Mean Std. Dev. N
Predicted Value 180482 1088949 516286.8 204922.36 91
Std. Predicted Value -1.639 2.795 0 1 91
Standard Error of
20271.03 84623.36 34196.86 10905.393 91
Predicted Value

2.4 Regression WITHOUT OUTLIERS and 3 VARIABLES (Final Result)


Again here, the variables have been removed one by one and the regressions analysis have been
done over again, leaving 3 final significant variables in the final prediction model.

Model Summary

Std.
R Adjusted R Error of
Model R
Square Square the
Estimate
.859 0.739 0.73 122473
Change Statistics
Model F Sig. F
df1 df2
Change Change
81.992 3 87 .000

ANOVA

Sum of Mean
Model df F Sig.
Squares Square
Regression 3.69E+12 3 1.2298E+12 81.992 .000
Residual 1.3E+12 87 1.5E+10
Total 4.99E+12 90

The living space, #Bathrooms and extra expense seems to be significant because of high t-scores
(4.43, 3,801 and 7.126) in this model.

3. VARIABLES NO OUTLIERS - Residuals Statistics


We first notice a great reduction in the errors, both when the outliers were removed and all the
variables were kept, and even more after the insignificant variables were removed in the no-
outliers model.

Minimum Maximum Mean Std. Dev. N


Predicted Value 230597.7 1068521 516286.8 202472.131 91
Std. Predicted Value -1.411 2.727 .000 1 91
Standard Error of Predicted
16614.64 78237.45 24220.92 8572.089 91
Value

This reduction of errors tells that the decision to remove the outliers and then to remove the
variables as we was done was a good choice towards the final model. The standard errors of the
estimate were almost decreased to half after removing the outliers (from 212457.7 to 122473)
which also a great sign of a better prediction model.

The Adjusted R-Square value is the no outliers models is greatly smaller than when the outliers
were involved (from 0.889 to 0.73 value), however, a more realistic model is achieve in the second
case where the adjusted R-square value resembles the actual percentage of variation that our
selected variables for the model can explain out of the Total variation in the Asking Price; It is
more realistic to say that the 3 main variables living space, extra exp, and # of bathrooms resemble
only 73% of price variation than 88.9% since the real-estate business is a very case-specific field).

Also, the value of B0 has noticeably decreased from the initial models, and that is because of the
fact that after removing the outliers, the slope of the prediction line will definitely be less intense
and will therefore have a Y-Intercept of lesser value than before due to the fact that the data
considered after removing the outliers is more related under the same definitions (no more luxury
apartments) and will therefore not be affected by the extreme values of the luxurious outliers price-
wise.

4. Conclusion
We have examined our sample and compared the regression analysis with and without outliers.
After the comparison, we construct a more reliable prediction model aimed at apartments with a
precise price range by analysis without outliers. Through the process towards the final prediction
model, the following is discovered:

 The Living Space, Extra Expenses and Number of Bathrooms variables are the most
significant variables out of the original 7 independent variables when trying to predict the
price of an apartment in the Downtown Montreal area which explain 73% of total variation
of Asking Price on their own.

 The Final Prediction Model using Multiple Regression is:


Asking Price = - 8589.73 + 157.75 (Living Area) + 112420.1 (# Bathrooms) + 20.132
(Extra Charges)
Which physically means, for each 1 extra square foot of living area, the asking price
will increase by 157.75 $. For each bathroom available in the apartment, the asking price
will increase by 112420.1 $. And finally, for each extra dollar paid as extra apartment
expenses, the asking price will increase by 20.13 $.

 Comparing Samples of random selected variable values to test all the models using
interactive model in Excel:

Variable space extra exp bathrooms story


Apt. Sampled 1000 5,000.00 2 4
For
outliers 381,831.14
Predicted Price
No
outliers 474,660.47
Variable space extra exp bathrooms story
Apt. Sampled 1700 20,000.00 3 10
For
outliers 1,332,936.87
Predicted Price
No
outliers 999,485.57
Variable space extra exp bathrooms story
Apt. Sampled 2600 50,000.00 3 10
For
outliers 2,619,513.87
Predicted Price
No
outliers 1,745,420.57

We see that prediction might vary greatly according to samples (with or without outliers –
apartments with high price). It is noticed that the model with the outliers might be better in reality
to predict the prices of luxury apartments (extreme living space or extra expenses). However, by
removing these outliers, we arrive at a sample of apartments with a more focused and normally
distributed price range, which makes the prediction model more reliable for buyers who have
budget within this range or who do not desire extreme living space or luxury. The final model with
increased internal validity would help with decision-making of this group of people.

S-ar putea să vă placă și