LR

Regresin logstica
Freddy Hernndez Barajas
1 / 40
Logistic regression
Logistic regression is used for binary outcome data, where Y = 0 (fail) or Y = 1 (success).
The objective is to estimate the success probability:

P(Y = 1|X = x )
given x a vector of covariates.
2 / 40
Bernoulli and binomial distribution
Consider an experiment that give only two outcomes Y = 0 or Y = 0. Let P(Y = 1) = p

and P(Y = 0) = 1 p. This is a Bernoulli experiment.
The probability mass function is given by
P(Y = y ) = p y (1 p)1y
Now consider n independent experiments where each experiment gives only two outcomes
1 or 0, this is a binomial experiment. Let X be the number of outcome 1, this is called the
binomial random variable with parameter n and p. It will be denoted by X Binomial(n, p).
The probability mass function is given by

P(X = x ) =
n x
p (1 p)nx
x
Here x = 0, 1, . . . , n.
3 / 40
Logit function
The logit function maps the unit interval onto the real line.
logit(x ) = log
x
1x
The inverse logit function maps the real line onto the unit interval.
logit1 (x ) =
exp(x )
1
=
1 + exp(x )
1 + exp(x )
In logistic regression, the logit function is used to map the linear predictor 0 X to a
probability.
4 / 40
Logit function
1.0
4
0.8
2
logit.inv(x)
logit(x)
0.6
0
0.4
2
0.2
4
0.0
0.0
0.2
0.4
0.6
x
0.8
1.0
5 / 40
Logistic regression
We can use wrongly the next model

P(Y = 1|X = x ) = 0 x ,
The correct way to model P(Y = 1|X = x ) is by logit function.
logit (P(Y = 1|X = x )) = 0 x ,
and using the definition of logit function,
P(Y = 1|X = x ) =
exp( 0 x )
1
=
,
1 + exp( 0 x )
1 + exp( 0 x )
which implies that

P(Y = 0|X = x ) = 1 P(Y = 1|X = x ) =
1
1 + exp( 0 x )
6 / 40
Ejemplo apartamentos
Ser posible predecir si un apartamento tendr balcn o no a partir de algunas

caractersticas del mismo?
7 / 40
url <- 'https://raw.githubusercontent.com/fhernanb/datos/master/aptos2015'
datos <- read.table(file=url, header=T)
datos$balcon <- datos$balcon == 'si'
datos$estrato <- as.factor(datos$estrato)
head(datos)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
1
2
3
4
5
6
1
2
3
4
5
6
precio
mt2 ubicacion estrato alcobas banos balcon parqueadero
79 43.16
norte
3
3
1
TRUE
si
93 56.92
norte
2
2
1
TRUE
si
100 66.40
norte
3
2
2 FALSE
no
123 61.85
norte
2
3
2
TRUE
si
135 89.80
norte
4
3
2
TRUE
no
140 71.00
norte
3
3
2 FALSE
si
administracion
avaluo terminado
0.050 14.92300
no
0.069 27.00000
si
0.000 15.73843
no
0.130 27.00000
no
0.000 39.56700
si
0.120 31.14551
si
dim(datos)
## [1] 694
11
8 / 40
Exploracin de los datos

with(datos, table(ubicacion, balcon))
##
balcon
## ubicacion
FALSE TRUE
##
aburra sur
38 131
##
belen guayabal
12
55
##
centro
18
20
##
laureles
25
48
##
norte
2
8
##
occidente
28
41
##
poblado
62 206
with(datos, table(estrato, balcon))
##
balcon
## estrato FALSE TRUE
##
2
3
5
##
3
59 102
##
4
35 103
##
5
35 110
##
6
53 189
9 / 40

with(datos, table(alcobas, balcon))
##
balcon
## alcobas FALSE TRUE
##
1
7
8
##
2
34
94
##
3
120 353
##
4
21
48
##
5
2
6
##
14
1
0
with(datos, table(banos, balcon))
##
balcon
## banos FALSE TRUE
##
1
32
39
##
2
109 308
##
3
33 117
##
4
9
38
##
5
1
7
##
6
1
0
10 / 40
1000
500
0
Precio (millones)
1500
with(datos, boxplot(precio ~ balcon, ylab='Precio (millones)',

xlab='Presencia de balcn'))
FALSE
TRUE
Presencia de balcn
11 / 40
300
200
100
rea (mt2)
400
500
with(datos, boxplot(mt2 ~ balcon, ylab='rea (mt2)',

xlab='Presencia de balcn'))
FALSE
TRUE
Presencia de balcn
12 / 40
Dividiendo la muestra en dos grupos:

1
2
un conjunto de entrenamiento
y un conjunto de validacin.
set.seed(12345) # Para fijar la semilla

entrenamiento <- sample(1:nrow(datos), 500)
training <- datos[entrenamiento, ] # datos de entrenamiento
validation <- datos[-entrenamiento,] # datos de validacin
13 / 40
Ajuste con gamlss
require(gamlss)
mod1 <- gamlss(balcon ~ precio + mt2 + alcobas + banos + administracion +
avaluo + parqueadero + estrato + ubicacion + terminado,
family=BI, data=training)
## GAMLSS-RS iteration 1: Global Deviance = 533.7589
14 / 40
Ajuste con gamlss

summary(mod1)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
*******************************************************************
Family: c("BI", "Binomial")
Call:
gamlss(formula = balcon ~ precio + mt2 + alcobas + banos + administracion +
avaluo + parqueadero + estrato + ubicacion + terminado, family = BI,
data = training)
Fitting method: RS()
------------------------------------------------------------------Mu link function: logit
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.356141
1.003835
0.355
0.7229
precio
0.003447
0.001426
2.417
0.0160 *
mt2
-0.003376
0.004125 -0.818
0.4136
alcobas
-0.124762
0.145227 -0.859
0.3907
banos
0.040984
0.208165
0.197
0.8440
administracion
-0.545263
1.084062 -0.503
0.6152
avaluo
0.001807
0.001332
1.356
0.1756
parqueaderosi
0.402516
0.350807
1.147
0.2518
estrato3
0.029470
0.934596
0.032
0.9749
estrato4
0.780036
0.973970
0.801
0.4236
estrato5
0.603801
1.001407
0.603
0.5468
estrato6
0.684924
1.100046
0.623
0.5338
ubicacionbelen guayabal 0.461554
0.432466
1.067
0.2864
ubicacioncentro
-1.144698
0.519925 -2.202
0.0282 *
ubicacionlaureles
-1.096556
0.458645 -2.391
0.0172 *
ubicacionnorte
0.474508
0.864193
0.549
0.5832
ubicacionoccidente
-0.711255
0.391512 -1.817
0.0699 .
ubicacionpoblado
-1.211393
0.540687 -2.240
0.0255 *
terminadosi
-0.026951
0.293539 -0.092
0.9269
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------No. of observations in the fit: 500
15 / 40
Ajuste con glm
mod2 <- glm(balcon ~ precio + mt2 + alcobas + banos + administracion +

parqueadero + estrato + ubicacion + terminado,
family=binomial, data=training)
16 / 40
Ajuste con glm

summary(mod2)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
glm(formula = balcon ~ precio + mt2 + alcobas + banos + administracion +
parqueadero + estrato + ubicacion + terminado, family = binomial,
data = training)
Deviance Residuals:
Min
1Q
Median
-2.4490 -1.0871
0.5816
Coefficients:
(Intercept)
precio
mt2
alcobas
banos
administracion
parqueaderosi
estrato3
estrato4
estrato5
estrato6
ubicacionbelen guayabal
ubicacioncentro
ubicacionlaureles
ubicacionnorte
ubicacionoccidente
ubicacionpoblado
terminadosi
--Signif. codes: 0 '***'
3Q
0.8303
Max
1.4960
Estimate Std. Error z value Pr(>|z|)

0.310747
1.002508
0.310
0.7566
0.003869
0.001496
2.586
0.0097 **
-0.002246
0.004102 -0.547
0.5840
-0.138304
0.145702 -0.949
0.3425
0.031576
0.208404
0.152
0.8796
-0.135977
1.046630 -0.130
0.8966
0.395144
0.350212
1.128
0.2592
0.030476
0.933479
0.033
0.9740
0.778572
0.972628
0.800
0.4234
0.651129
0.999516
0.651
0.5148
0.729840
1.100420
0.663
0.5072
0.483110
0.431752
1.119
0.2632
-0.971250
0.504028 -1.927
0.0540 .
-1.128321
0.457343 -2.467
0.0136 *
0.441598
0.864879
0.511
0.6096
-0.697484
0.391860 -1.780
0.0751 .
-1.225609
0.541238 -2.264
0.0235 *
0.014899
0.291929
0.051
0.9593
0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
17 / 40
Seleccin de variables con gamlss
empty.mod1 <- gamlss(balcon ~ 1, family=BI, data=training)

sup <- formula(~ precio + mt2 + alcobas + banos + administracion +
parqueadero + estrato + ubicacion + terminado)
n <- nrow(training) # nmero de observaciones
mod1.final <- stepGAICAll.A(empty.mod1, trace=F, k=log(n),
scope=list(lower=~1, upper=sup))
## --------------------------------------------------## Start: AIC= 591.45
## balcon ~ 1
##
## ---------------------------------------------------
18 / 40
Resumen del modelo con gamlss
summary(mod1.final)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
*******************************************************************
Family: c("BI", "Binomial")
Call: gamlss(formula = balcon ~ precio, family = BI, data = training,
trace = FALSE)
Fitting method: RS()
------------------------------------------------------------------Mu link function: logit
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3111237 0.1839521
1.691
0.0914 .
precio
0.0024384 0.0006045
4.034 6.35e-05 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
------------------------------------------------------------------No. of observations in the fit: 500
Degrees of Freedom for the fit: 2
Residual Deg. of Freedom: 498
at cycle: 2
Global Deviance:
565.8916
AIC:
569.8916
SBC:
578.3208
*******************************************************************
19 / 40
Seleccin de variables backward con glm backward

# backward
mod2back <- stepAIC(mod2, trace=TRUE, direction="backward")
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Start: AIC=572.01
balcon ~ precio + mt2 + alcobas + banos + administracion + parqueadero +
estrato + ubicacion + terminado
- estrato
- terminado
- administracion
- banos
- mt2
- alcobas
- parqueadero
<none>
- ubicacion
- precio
Df Deviance
AIC
4
540.59 568.59
1
536.02 570.02
1
536.03 570.03
1
536.04 570.04
1
536.31 570.31
1
537.01 571.01
1
537.29 571.29
536.01 572.01
6
553.33 577.33
1
544.27 578.27
Step: AIC=568.59
balcon ~ precio + mt2 + alcobas + banos + administracion + parqueadero +
20 / 40
ubicacion + terminado
Resumen del modelo con glm backward

summary(mod2back)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
glm(formula = balcon ~ precio + parqueadero + ubicacion, family = binomial,
data = training)
Deviance Residuals:
Min
1Q
Median
-2.4130 -1.0516
0.6447
Coefficients:
(Intercept)
precio
parqueaderosi
ubicacioncentro
ubicacionlaureles
ubicacionnorte
ubicacionoccidente
ubicacionpoblado
3Q
0.8211
Max
1.5281

0.0623128 0.3085552
0.202 0.839956
0.0032353 0.0009362
3.456 0.000549 ***
0.6989467 0.2970329
2.353 0.018618 *
0.4727034 0.4189748
1.128 0.259219
-0.7354608 0.4502809 -1.633 0.102398
-0.9379806 0.3839490 -2.443 0.014566 *
0.2617874 0.8381272
0.312 0.754776
-0.7547802 0.3807871 -1.982 0.047462 *
-0.9794375 0.3419689 -2.864 0.004182 **
0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 585.24
Residual deviance: 542.43
AIC: 560.43
on 499
on 491
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 5
21 / 40
Seleccin de variables forward con glm forward

empty.mod2 <- glm(balcon ~ 1, family=binomial, data = training)
horizonte <- formula(lm(balcon ~ ., data = training))
mod2forw <- stepAIC(empty.mod2, trace=FALSE, direction="forward",
scope=horizonte)
mod2forw$anova
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Stepwise Model Path

Analysis of Deviance Table
Initial Model:
balcon ~ 1
Final Model:
balcon ~ precio + ubicacion + parqueadero
Step Df Deviance Resid. Df Resid. Dev
1
499
585.2380
2
+ precio 1 19.346390
498
565.8916
3
+ ubicacion 6 17.917074
492
547.9745
4 + parqueadero 1 5.543247
491
542.4312
AIC
587.2380
569.8916
563.9745
560.4312
22 / 40
Resumen del modelo con glm forward

summary(mod2forw)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
glm(formula = balcon ~ precio + ubicacion + parqueadero, family = binomial,
data = training)
Deviance Residuals:
Min
1Q
Median
-2.4130 -1.0516
0.6447
Coefficients:
(Intercept)
precio
ubicacioncentro
ubicacionlaureles
ubicacionnorte
ubicacionoccidente
ubicacionpoblado
parqueaderosi
3Q
0.8211
Max
1.5281

0.0623128 0.3085552
0.202 0.839956
0.0032353 0.0009362
3.456 0.000549 ***
0.4727034 0.4189748
1.128 0.259219
-0.7354608 0.4502809 -1.633 0.102398
-0.9379806 0.3839490 -2.443 0.014566 *
0.2617874 0.8381272
0.312 0.754776
-0.7547802 0.3807871 -1.982 0.047462 *
-0.9794375 0.3419689 -2.864 0.004182 **
0.6989467 0.2970329
2.353 0.018618 *
0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 585.24
Residual deviance: 542.43
AIC: 560.43
on 499
on 491
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 5
23 / 40
Residuales mod1.final
par(mfrow=c(2, 2))
plot(mod1.final)
0.8
1
0
Quantile Residuals
0.7
0.9
100
200
300
400
Fitted Values
index
Density Estimate
Normal QQ Plot
500
1
0
Sample Quantiles
4
3 2 1
0.2
0.0
0.1
Density
0.3
0.6
3 2 1
1
0
3 2 1
Quantile Residuals
Against index
Against Fitted Values
Quantile. Residuals
## *******************************************************************
Theoretical Quantiles
24 / 40
Worm plot mod1.final
0.0
0.2
0.4
Deviation
0.2
0.4
wp(mod1.final)
Unit normal quantile
25 / 40
Residuales mod2back
par(mfrow=c(2, 2))
plot(mod2back)
Normal QQ
323
324
301
323
324
Predicted values
ScaleLocation
Residuals vs Leverage
1.5
324
323
0
4 3 2 1
0.5
1.0
Std. Pearson resid.
301
3
6
324
Cook's
distance
0.0
Std. deviance resid.
0
2
301
0
1
2
Residuals
Residuals vs Fitted
0.5
2
Predicted values
0.00
0.05
0.10
Leverage
0.15
26 / 40
Residuales mod2forw
par(mfrow=c(2, 2))
plot(mod2forw)
Normal QQ
323
324
301
323
324
Predicted values
ScaleLocation
Residuals vs Leverage
1.5
324
323
0
4 3 2 1
0.5
1.0
Std. Pearson resid.
301
3
6
324
Cook's
distance
0.0
0
2
301
0
1
2
Residuals
Residuals vs Fitted
0.5
2
Predicted values
0.00
0.05
0.10
Leverage
0.15
27 / 40
Modelo a usar
El modelo estimado es este:
P(Y = balcn|X = x ) =
exp(0.3111 + 0.0024 Precio)

1 + exp(0.3111 + 0.0024 Precio)
se puede escribir as:

P(Y = balcn|X = x ) =
1
1 + exp(0.3111 0.0024 Precio)
Otra forma alternativa de escribir el modelo es:

log
P(Y = balcn|X = x )
P(Y = sin balcn|X = x )

= 0.3111 + 0.0024 Precio
28 / 40
Probabilidades estimadas para el conjuto de entrenamiento

training$prob.est <- fitted.values(mod1.final)
training[1:15, c('precio', 'mt2', 'balcon', 'prob.est')]
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
501
607
527
613
315
115
224
350
500
678
24
105
502
1
267
precio
285
130
115
165
865
260
435
82
285
370
105
235
287
79
520
mt2 balcon prob.est

100.00
TRUE 0.7322517
66.00
TRUE 0.6520630
61.00
TRUE 0.6437192
86.00 FALSE 0.6711663
196.00
TRUE 0.9183673
98.00
TRUE 0.7201319
118.00
TRUE 0.7976775
58.00
TRUE 0.6250585
98.00
TRUE 0.7322517
100.00
TRUE 0.7708900
42.00 FALSE 0.6381074
85.00 FALSE 0.7076825
96.00
TRUE 0.7332068
43.16
TRUE 0.6233425
132.00
TRUE 0.8290782
29 / 40
Probabilidades estimadas para el conjuto de validacin

validation$prob.est <- predict(mod1.final, newdata=validation)
validation[1:15, c('precio', 'mt2', 'balcon', 'prob.est')]
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
9
11
14
17
18
20
27
31
32
35
42
43
45
47
49
precio
mt2 balcon prob.est
160 93.00
TRUE 0.7012728
47 43.00 FALSE 0.4257300
70 42.00
TRUE 0.4818139
80 45.35
TRUE 0.5061983
90 100.00
TRUE 0.5305826
90 40.00 FALSE 0.5305826
115 40.00 FALSE 0.5915434
130 54.00
TRUE 0.6281199
130 78.00 FALSE 0.6281199
143 56.00
TRUE 0.6598195
155 94.10
TRUE 0.6890807
155 74.00 FALSE 0.6890807
160 58.00
TRUE 0.7012728
180 66.00
TRUE 0.7500415
185 64.90
TRUE 0.7622336
30 / 40
Tasa de clasificacin correcta con corte de 0.5
Aqu es donde se usa el conjunto de validacin.

corte <- 0.5
balcon.est <- validation$prob.est >= corte
res <- table(validation$balcon, balcon.est)
addmargins(res)
##
##
##
##
##
balcon.est
FALSE TRUE Sum
FALSE
3
46 49
TRUE
1 144 145
Sum
4 190 194
100 * sum(diag(res)) / sum(res)

## [1] 75.7732
31 / 40
Tasa de clasificacin correcta con corte de 0.8
corte <- 0.80

addmargins(res)
##
##
##
##
##
balcon.est
FALSE TRUE Sum
FALSE
19
30 49
TRUE
39 106 145
Sum
58 136 194

## [1] 64.43299
32 / 40
Buscando el mejor punto de corte
cutoff <- function(corte) {

}
cutoff <- Vectorize(cutoff)
# Ejemplo de la vectorizacion
cutoff(c(0.5, 0.8))
## [1] 75.77320 64.43299
33 / 40
Buscando el mejor punto de corte a ojo

candidatos <- seq(from=0.01, to=0.99, by=0.05)
plot(x=candidatos, y=cutoff(candidatos), type='b',
xlab='Punto de corte', las=1,
ylab='Porcentaje de clasificacin correcta')
Porcentaje de clasificacin correcta
70
60
50
40
30
0.0
0.2
0.4
0.6
Punto de corte
0.8
34 / 40
Buscando el mejor punto de corte por optimizacin
optimize(f=cutoff, interval=c(0, 1), maximum=TRUE)

##
##
##
##
##
$maximum
[1] 0.4893854
$objective
[1] 75.7732
35 / 40
Tabla de clasificacin para el conjunto de validadcin
corte <- 0.4893854

clasi.est <- validation$prob.est >= corte
res <- table(clasi.est, validation$balcon)
addmargins(res)
##
## clasi.est FALSE TRUE Sum
##
FALSE
3
1
4
##
TRUE
46 144 190
##
Sum
49 145 194
## [1] 75.7732
36 / 40
Outputs from glm summary

Here is a quick summary of what you see from the summary(glm.fit) output,
NullDeviance = 2(ll(SaturatedModel) ll(NullModel))
with df = dfSat dfNull
ResidualDeviance = 2(ll(SaturatedModel) ll(ProposedModel))
with df = dfSat dfRes
The SaturatedModel is a model that assumes each data point has its own parameters
(which means you have n parameters to estimate.)
The NullModel assumes the exact opposite, in that is assumes one parameter for all
of the data points, which means you only estimate 1 parameter.
The ProposedModel assumes you can explain your data points with p parameters plus
an intercept term, so you have p + 1 parameters.
Deviance is a distance.
Deviance is a quality-of-fit statistic for a model that is often used for statistical
hypothesis testing.
The greater the deviance, the worse the model fits compared to the best case
(saturated).
37 / 40
Deviance
Ilustracin grfica
38 / 40
Deviance para el ejemplo con glm

Vamos a ajustar con glm el modelo que fue identificado como el mejor modelo.
mod.glm <- glm(balcon ~ precio, family=binomial, data=training)
mod.glm$deviance
## [1] 565.8916
mod.glm$null.deviance
## [1] 585.238
anova(mod.glm, test='Chisq')
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Analysis of Deviance Table

Model: binomial, link: logit
Response: balcon
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL
499
585.24
precio 1
19.346
498
565.89 1.09e-05 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
39 / 40
Deviance para el ejemplo con gamlss
Vamos a repetir los clculos anteriores pero con gamlss.

mod.gamlss <- gamlss(balcon ~ precio, family=BI, data=training)
mod.gamlss.null <- gamlss(balcon ~ 1, family=BI, data=training)
-2*(logLik(mod.gamlss.null) - logLik(mod.gamlss))
## 'log Lik.' 19.34639 (df=1)
40 / 40

LR

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

LR

Încărcat de

Drepturi de autor:

Formate disponibile

Regresin logstica

Freddy Hernndez Barajas

The objective is to estimate the success probability:

Bernoulli and binomial distribution

Consider an experiment that give only two outcomes Y = 0 or Y = 0. Let P(Y = 1) = p

We can use wrongly the next model

which implies that

Ser posible predecir si un apartamento tendr balcn o no a partir de algunas

Exploracin de los datos

Exploracin de los datos

Exploracin de los datos

with(datos, boxplot(precio ~ balcon, ylab='Precio (millones)',

Exploracin de los datos

with(datos, boxplot(mt2 ~ balcon, ylab='rea (mt2)',

Dividiendo la muestra en dos grupos:

set.seed(12345) # Para fijar la semilla

Ajuste con gamlss

Ajuste con gamlss

Ajuste con glm

mod2 <- glm(balcon ~ precio + mt2 + alcobas + banos + administracion +

Ajuste con glm

Estimate Std. Error z value Pr(>|z|)

Seleccin de variables con gamlss

empty.mod1 <- gamlss(balcon ~ 1, family=BI, data=training)

Resumen del modelo con gamlss

Seleccin de variables backward con glm backward

Resumen del modelo con glm backward

Estimate Std. Error z value Pr(>|z|)

(Dispersion parameter for binomial family taken to be 1)

Number of Fisher Scoring iterations: 5

Seleccin de variables forward con glm forward

Stepwise Model Path

Resumen del modelo con glm forward

Estimate Std. Error z value Pr(>|z|)

(Dispersion parameter for binomial family taken to be 1)

Number of Fisher Scoring iterations: 5

Against Fitted Values

Worm plot mod1.final

Unit normal quantile

Std. Pearson resid.

Std. deviance resid.

Std. deviance resid.

Std. Pearson resid.

Std. deviance resid.

Std. deviance resid.

exp(0.3111 + 0.0024 Precio)

se puede escribir as:

Otra forma alternativa de escribir el modelo es:

Probabilidades estimadas para el conjuto de entrenamiento

mt2 balcon prob.est

Probabilidades estimadas para el conjuto de validacin

Tasa de clasificacin correcta con corte de 0.5

Aqu es donde se usa el conjunto de validacin.

100 * sum(diag(res)) / sum(res)

Tasa de clasificacin correcta con corte de 0.8

corte <- 0.80

100 * sum(diag(res)) / sum(res)

Buscando el mejor punto de corte

cutoff <- function(corte) {

Buscando el mejor punto de corte a ojo

Porcentaje de clasificacin correcta

Buscando el mejor punto de corte por optimizacin

optimize(f=cutoff, interval=c(0, 1), maximum=TRUE)

Tabla de clasificacin para el conjunto de validadcin