Sunteți pe pagina 1din 6

Proposing a valuation model in Real State

through Machine Learning


Jesús M. Ruiz
Machine Learning Program, Stanford University

Abstract
The aim of this practice case is the proposal of a learning algorithm in the Real State
sector. The two-principal learning algorithms are Supervised learning and Unsupervised
learning.
Supervised learning, we are given a data set and already know what our correct output
should look like, having the idea that there is a relationship between the input and the
output. Supervised learning problems are categorized into “regression” and
“classification” problems. Since the price of the housing is dealing as continue variable,
with infinite values, I have proposed a multiple linear regression algorithm.
Given data about the size of houses on the real estate market, try to predict their price.
Price as a function of size is a continuous output, so this is a regression problem.
Additionally, the results have been compared with the results get from the program SPSS,
making a discuss about.

1. Introduction Trough the algorithm is intended to


implement the best function that better
It has been proposal 47 measures about
adjust to the given data.
the following variables:
It has been assumed a linear relationship
- Square feet of housing
between the independent and dependent
- Number of rooms
variables.
- Price
The algorithm learns from the provide
Independent variables are Square feet
variables in order to be able to make
and Number of rooms. The dependent
predictions.
(predict) variable is Price.
2. Parameters and hypothesis
Parameters of the hypothesis we want
our algorithm to learn is given in the
Annex 1 of this study.
The equation that gets features and
parameters as an input and predicts the
output value (price) is supposed to be as
follow:
ℎ𝜃 = 𝜃0 ∗ 𝑥0 + 𝜃1 ∗ 𝑥1 + ⋯ + 𝜃𝑛 ∗ 𝑥𝑛
As a convenience system, define x0=1
2.2. Function Predictions
2.1. Normalized parameters Following has been created the
prediction function that give answer to
The parameters given present different
the equation:
measures scales.
ℎ𝜃 = 𝜃0 ∗ 𝑥0 + 𝜃1 ∗ 𝑥1 + ⋯ + 𝜃𝑛 ∗ 𝑥𝑛
It´s necessary that all the parameters
have the same scale, so the first step is to
make a normalization of each feature,
getting values between -1 and 1.
The normalization process is done by the
following:
𝑥𝑗 − 𝜇𝑗
𝑥𝑗 =
𝑠𝑗

𝑥𝑗 : 𝑗 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.

𝜇𝑗 : 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑗 𝑓𝑒𝑎𝑡𝑢𝑟𝑒

𝑠𝑗 ∶ 𝑠𝑡𝑎𝑛𝑑𝑎𝑟 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑗 𝑓𝑒𝑎𝑡𝑢𝑟𝑒


3. Function cost
Through this methodology, the
Following it´s shown the octave code algorithm calculates an optimization of
function for the normalization process. the function cost.
The function cost shows how accurate
the predictions of the hypothesis are with
the real value of the price housing.
It´s calculated using the next
formulation:
𝑚
1
𝐽(𝜃) = ∑(ℎ𝜃 − 𝑦)2
2𝑚 5. Gradient step
𝑖=1

4. Gradient descent
6. Linear regression model

The aim is to find the minimum of the


cost function described above. Gradient
descent function is an interactive
optimization algorithm for finding this
minimum trough negative steps.

𝑚
1 7. Results with Octave
𝜃𝑗 = 𝜃𝑗 − 𝛼 ∑(ℎ𝜃 − 𝑦) ∗ 𝑥𝑗
𝑚
𝑖=1
- Initial cost: 95968267006010.046875
- Optimized cost: 4553682196675.862305
𝛼 = 𝑡ℎ𝑒 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒.
- Theta (with normalization):
𝑚 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠
-- 340407.801043
𝑥𝑗
-- 104127.515597
= 𝑗 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔
-- -172.205334
Following it´s the equation proposal by
Octave after develop the algorithm:

ℎ𝜃 = 340407,80 + 104127,51 ∗ 𝑥1
− 172,20 ∗ 𝑥2
X 1: square feet of house
X 2: number of rooms
8. Results with SPSS

Results of the model

Resumen del modelob


Error Estadísticos de cambio
estándar
R R de la Cambio Sig.
Mode cuadrad cuadrado estimació en R Cambio Cambio Durbin-
lo R o ajustado n cuadrado en F gl1 gl2 en F Watson
1 ,856a ,733 ,721 66069,57 ,733 60,380 2 44 ,000 1,826
847
a. Predictores: (Constante), SquareFeet, NumberRooms
b. Variable dependiente: Price

R2: 0.733. It means, variability of price could be explained in 73 % by the variability in


number of rooms and Square feet.
Standard error of the model: 66.069,57
Durbin-Watson: 1,826

Coeficientesa
Coeficient
es
Coeficientes no estandari Estadísticas de
estandarizados zados Correlaciones colinealidad
Desv. Orden Parcia Toleranc
Modelo B Error Beta t Sig. cero l Parte ia VIF
1 (Constante 89597,91 41767,41 2,145 ,037
) 0 9
NumberRo -8738,019 15450,69 -,053 -,566 ,575 ,442 -,085 -,044 ,686 1,457
oms 6
SquareFee 139,211 14,795 ,885 9,409 ,000 ,855 ,817 ,733 ,686 1,457
t
a. Variable dependiente: Price

Equation:
ℎ𝜃 = 89.597,91 + 139,21 ∗ 𝑥1 − 8738,019 ∗ 𝑥2
X 1: square feet of house
X 2: number of rooms
ANNEX 1: INPUT VALUES OF THE ALGORITHM

2162 4 287000
1664 2 368500
SQUARE NUMBER PRICE
FEET OF ($) 2238 3 329900
ROOMS 2567 4 314000
2104 3 399900 1200 3 299000
1600 3 329900 852 2 179900
2400 3 369000 1852 4 299900
1416 2 232000 1203 3 239500
3000 4 539900
1985 4 299900
1534 3 314900
1427 3 198999
1380 3 212000
1494 3 242500
1940 4 239999
2000 3 347000
1890 3 329999
4478 5 699900
1268 3 259900
2300 4 449900
1320 2 299900
1236 3 199900
2609 4 499998
3031 4 599000
1767 3 252900
1888 2 255000
1604 3 242900
1962 4 259900
3890 3 573900
1100 3 249900
1458 3 464500
2526 3 469000
2200 3 475000
2637 3 299900
1839 2 349900
1000 1 169900
2040 4 314900
3137 3 579900
1811 4 285900
1437 3 249900
1239 3 229900
2132 4 345000
4215 4 549000
ANNEX 2: RESULTS OF THE ALGORITHM

S-ar putea să vă placă și