Documente Academic
Documente Profesional
Documente Cultură
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
The usual linear regression models assume data come from a Normal distribution. . . . . . with the mean related to predictors Generalized linear models (GLMs) assume data come from some distribution. . . . . . with a function of the mean related to predictors Model Randomness Structure
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
The model for the randomness: Y P(, ) The model for the structure: g () = X
We can choose from many distributions P We can choose from many link functions g () in a separate decision (Using a transformation in regression approximately makes both decisions at once)
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Examples
Example Counts may be modelled using a Poisson distribution Usually, use a log link Dene = E[Y ] as the expected count The model is Yi Poisson(i ) (random) log i = X (systematic) The log link ensures = exp(X ) is always positive The log link means the eect of the covariates xj on is multiplicative not additive
Examples
Example Proportions may be modelled using a binomial distribution Often, use a logit link (to get a logistic regression model) Dene = E[Y ] as the expected proportion The model is Yi Binomial(i ) (random) logit(i ) = X (systematic) (random) Yi Binomial(i ) i = X (systematic) log 1 i
Examples Using R
Regression-type models
Examples
Using R
R examples
Regression-type models
R examples
To t a glm, R must know the distribution and link function Fit a regression model in R using (for example)
glm( y ~ x1 + log( x2 ) + x3, family=poisson( link="log" ) )
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
no bi
Link function indentity log inverse sqrt logit probit cauchit cloglog = log = 1/ = = logit() = probit() = cauchit() = cloglog() =
ga u
po i
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
ga mm a
R examples
In R. . .
To t a glm in R, we need to specify:
The linear predictor: x1+x2+log(x3) The distribution: family=poisson The link function: link="log"
Glms in R?
Fitting glms is locally like tting a standard regression model So most regression concepts have (approximate) analogies for glms For example, R allows the user to:
t glms (use glm) nd important predictors (F -tests using anova; t-tests using summary) compute residuals (using resid; quantile residuals in package statmod strongly recommended: qresid) perform diagnostics (using plot, hatvalues cooks.distance, etc.)
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Example: Poisson
Example 3 children < 14 (C = 1) SLE No SLE (S = 1) (S = 0) Depres. (D = 1) OK (D = 0) 9 12 0 20 Others (C = 0) SLE No SLE (S = 1) (S = 0) 24 119 4 231
Example
To t the minimal model in R:
dep.glm <- glm( Counts ~ C + S + D, family=poisson(link=log) )
The data are counts, so use a poisson family (and default log link) Initially, use the linear predictor C + S + D
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Example
What predictors are signicant? Sequential test:
> anova(dep.full, test = "Chisq") Analysis of Deviance Table Model: poisson, link: log Response: Counts Terms added sequentially (first to last)
Example
What predictors are signicant? Post-t test:
> summary(dep.full) Call: glm(formula = Counts ~ D * S * C, family = poisson(link = log), data = dep) Deviance Residuals: [1] 0 0 0 0 0 0 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 5.4424 0.0658 82.718 < 2e-16 *** D1 -4.0561 0.5043 -8.043 8.77e-16 *** S1 -0.6633 0.1128 -5.878 4.15e-09 *** C1 -2.4467 0.2331 -10.497 < 2e-16 *** D1:S1 2.4550 0.5517 4.450 8.60e-06 *** D1:C1 -21.2422 42247.1657 -0.001 1.00 S1:C1 0.1525 0.3822 0.399 0.69 D1:S1:C1 22.5556 42247.1657 0.001 1.00 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 7.1732e+02 Residual deviance: 4.1223e-10 AIC: 51.42 on 7 on 0 degrees of freedom degrees of freedom
Df Deviance Resid. Df Resid. Dev NULL 7 717.32 D 1 330.63 6 386.69 S 1 19.92 5 366.77 C 1 312.41 4 54.35 D:S 1 44.36 3 9.99 D:C 1 7.45 2 2.54 S:C 1 0.54 1 2.00 D:S:C 1 2.00 0 4.122e-10
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Example
To t one suggested model in R:
dep.opt <- glm( Counts ~ C + S * D, family=poisson(link=log) )
5 Index
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Plots: QQ plots
> library(statmod) > qqnorm(qresid(dep.opt))
cooks.distance(dep.opt)
20
Sample Quantiles
15
10
! ! !
!1
! !
5 Index
!1.5
!1.0
!0.5
0.0
0.5
1.0
1.5
Theoretical Quantiles
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Residuals ri vs tted values (default) |ri | vs (default) a QQ plot (default) A plot of Cooks distance Di A plot of ri vs hi with contours of equal Di (default) A plot of Di vs hi /(1 hi ), with contours of equal Di
> par(mfrow = c(2, 2)) > plot(dep.opt) > par(mfrow = c(1, 1))
!1
Residuals vs Fitted
1 2 3 Residuals
Normal Q!Q
1!
! ! ! !
! !
! ! 63 !
!6
!1
!3
!3
!1
1 2 3 4 5 Predicted values
!1.5
!0.5
0.5
1.5
Theoretical Quantiles
3!
!1
Scale!Location
1.0
6!
! ! ! ! !
Residuals vs Leverage
!
! !6
8! ! ! 0.5 1 0 1.5
0.0
!4
Cook's distance !
3
!1
1 2 3 4 5 Predicted values
0.0
0.4 Leverage
0.8
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Example
Residuals vs Leverage
!
Example
Prop. ssures
8!
!
! ! !6
!2
!4
3!
Leverage glm(Counts ~ D * S + C)
39 53 33 73 30 39
0 4 2 7 5 9
Hours
Hours
1 0.5 0.5 1
Prop. ssures
No. turbines
No. turbines
No. ssures
No. ssures
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Example
Example
Three ways to t binomial glms in R; here are two:
1
!
td.glm <- glm( prop ~ Hours, weights=Turbines, family=binomial(link=probit) ) td.glm <- glm( prop ~ Hours, weights=Turbines, family=binomial(link=cloglog) )
1000
2000
3000
4000
Hours of use
Regression-type models Examples Using R R examples
Example
The tted model is:
> summary(td.glm) Call: glm(formula = prop ~ Hours, family = binomial(link = logit), weights = Turbines) Deviance Residuals: Min 1Q Median -1.5055 -0.7647 -0.3036 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.9235966 0.3779589 -10.381 <2e-16 *** Hours 0.0009992 0.0001142 8.754 <2e-16 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 112.670 Residual deviance: 10.331 AIC: 49.808 on 10 on 9 degrees of freedom degrees of freedom
Example
> td.cf <- signif(coef(td.glm), 3) > td.cf (Intercept) -3.920000 Hours 0.000999
3Q 0.4901
Max 2.0943
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
hatvalues(td.glm)
cooks.distance(td.glm) 2 4 6 Index 8 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
6 Index
10
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Plots: QQ plots
> qqnorm(qresid(td.glm))
Example
Example
Normal Q!Q Plot
F
!
1.5
2.0
! ! ! ! ! ! !
! !
Sample Quantiles
!1.0
0.0 !1.5
0.5
1.0
!1.0
!0.5
0.0
0.5
1.0
1.5
Theoretical Quantiles
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Rates
Number of lung cancer patients is a count, so use a Poisson glm:
14
14
12
10
10
But lung cancer rate probably more useful Expected cancer rate is E[Yi /Ti ] = E[Yi ]/Ti = /Ti , where i is the expected number of cancers, Note Ti is known and not random. Using a logarithmic link, model the cancer rate as log(i /Ti ) = X or log i = log Ti + X log Ti is an oset: a component of the linear predictor with a known parameter value, here one.
Age group
City
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
14
14
0.020
0.020
10
10
0.010
0.010
6 0.005 0.005
!
2 Vejle Kolding Horsens Fredericia Fredericia Horsens Kolding City 40!54 55!59 60!64 65!69 70!74 Vejle >74
Age group
City
Age group
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Rates
To model lung cancer rate, use a Poisson glm with an oset:
lc.glm <- glm( Cases ~ offset( log(Population)) + City + Age, family=poisson(link=log) )
hatvalues(lc.glm)
10 Index
15
20
Regression-type models
Examples
Using R
R examples
Regression-type models
Examples
Using R
R examples
Plots: QQ plots
> library(statmod) > qqnorm(qresid(lc.glm))
0.4
cooks.distance(lc.glm)
! ! ! !!
Sample Quantiles
0.3
!! ! ! !!
!!!!
0.2
!1
! !
0.1
0.0
!2
!
10 Index
15
20
!2
!1
Theoretical Quantiles
Regression-type models
Examples
Using R
R examples
Other models
We haved looked at tting glms to
Proportions Counts Rates