Sunteți pe pagina 1din 6

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Regression models Generalized linear models in R


Dr Peter K Dunn
http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland

The usual linear regression models assume data come from a Normal distribution. . . . . . with the mean related to predictors Generalized linear models (GLMs) assume data come from some distribution. . . . . . with a function of the mean related to predictors Model Randomness Structure

ASC, July 2008

Regression model Y N(, ) = X GLM Y P(, ) g () = X

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Generalized linear models


Generalized linear models have two main components
1 2

Normal regression models are not always appropriate


There are obvious occasions when a Normal distribution is inappropriate:
Counts cannot have normal distributions: they are non-negative integers Proportions cannot have normal distributions: they are constrained between 0 and 1 Lots of continuous data are non-negative and have non-constant variance In all cases, the variance cannot be constant since a boundary on the responses exists

The model for the randomness: Y P(, ) The model for the structure: g () = X

We can choose from many distributions P We can choose from many link functions g () in a separate decision (Using a transformation in regression approximately makes both decisions at once)

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Examples
Example Counts may be modelled using a Poisson distribution Usually, use a log link Dene = E[Y ] as the expected count The model is Yi Poisson(i ) (random) log i = X (systematic) The log link ensures = exp(X ) is always positive The log link means the eect of the covariates xj on is multiplicative not additive

Examples
Example Proportions may be modelled using a binomial distribution Often, use a logit link (to get a logistic regression model) Dene = E[Y ] as the expected proportion The model is Yi Binomial(i ) (random) logit(i ) = X (systematic) (random) Yi Binomial(i ) i = X (systematic) log 1 i
Examples Using R

Regression-type models

Examples

Using R

R examples

Regression-type models

R examples

Basic tting of glms in R


Fit a regression model in R using
lm( y ~ x1 + log( x2 ) + x3 )

What distributions can I choose?


gaussian: a Gaussian (Normal) distribution binomial: a binomial distribution for proportions poisson: a Poisson distribution for counts Gamma: a gamma distribution for positive continuous data inverse.gaussian: an inverse Gaussian distribution for positive continuous data

To t a glm, R must know the distribution and link function Fit a regression model in R using (for example)
glm( y ~ x1 + log( x2 ) + x3, family=poisson( link="log" ) )

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

What link function can I choose?


an l mi a ss i ss o n

What link function can I choose?


er se .g au ss ia n in v
Using R

no bi

Link function indentity log inverse sqrt logit probit cauchit cloglog = log = 1/ = = logit() = probit() = cauchit() = cloglog() =

ga u

po i

Link function indentity log inverse 1/mu^2 = log = 1/ = 1/2 =

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

ga mm a

R examples

In R. . .
To t a glm in R, we need to specify:
The linear predictor: x1+x2+log(x3) The distribution: family=poisson The link function: link="log"

Glms in R?
Fitting glms is locally like tting a standard regression model So most regression concepts have (approximate) analogies for glms For example, R allows the user to:
t glms (use glm) nd important predictors (F -tests using anova; t-tests using summary) compute residuals (using resid; quantile residuals in package statmod strongly recommended: qresid) perform diagnostics (using plot, hatvalues cooks.distance, etc.)

They work together like this:


glm( y ~ x1 + x2 + log(x3), family=poisson(link = "log") )

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Example: Poisson
Example 3 children < 14 (C = 1) SLE No SLE (S = 1) (S = 0) Depres. (D = 1) OK (D = 0) 9 12 0 20 Others (C = 0) SLE No SLE (S = 1) (S = 0) 24 119 4 231

Example
To t the minimal model in R:
dep.glm <- glm( Counts ~ C + S + D, family=poisson(link=log) )

To t the full model R:


dep.full <- glm( Counts ~ C * S * D, family=poisson(link=log) )

We assume all qualitative variables are declared as factors

The data are counts, so use a poisson family (and default log link) Initially, use the linear predictor C + S + D

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Example
What predictors are signicant? Sequential test:
> anova(dep.full, test = "Chisq") Analysis of Deviance Table Model: poisson, link: log Response: Counts Terms added sequentially (first to last)

Example
What predictors are signicant? Post-t test:
> summary(dep.full) Call: glm(formula = Counts ~ D * S * C, family = poisson(link = log), data = dep) Deviance Residuals: [1] 0 0 0 0 0 0 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 5.4424 0.0658 82.718 < 2e-16 *** D1 -4.0561 0.5043 -8.043 8.77e-16 *** S1 -0.6633 0.1128 -5.878 4.15e-09 *** C1 -2.4467 0.2331 -10.497 < 2e-16 *** D1:S1 2.4550 0.5517 4.450 8.60e-06 *** D1:C1 -21.2422 42247.1657 -0.001 1.00 S1:C1 0.1525 0.3822 0.399 0.69 D1:S1:C1 22.5556 42247.1657 0.001 1.00 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 7.1732e+02 Residual deviance: 4.1223e-10 AIC: 51.42 on 7 on 0 degrees of freedom degrees of freedom

Df Deviance Resid. Df Resid. Dev NULL 7 717.32 D 1 330.63 6 386.69 S 1 19.92 5 366.77 C 1 312.41 4 54.35 D:S 1 44.36 3 9.99 D:C 1 7.45 2 2.54 S:C 1 0.54 1 2.00 D:S:C 1 2.00 0 4.122e-10

P(>|Chi|) 7.005e-74 8.066e-06 6.505e-70 2.725e-11 0.01 0.46 0.16

Number of Fisher Scoring iterations: 20

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Example
To t one suggested model in R:
dep.opt <- glm( Counts ~ C + S * D, family=poisson(link=log) )

Plots: Hat diagonals


> plot(hatvalues(dep.opt), type = "h", lwd = 2, + col = "blue")

Note that S * D means S + D and the interaction S : D


hatvalues(dep.opt) 0.8 0.2 1 0.4 0.6

5 Index

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Plots: Cooks distance


> plot(cooks.distance(dep.opt), type = "h", lwd = 2, + col = "blue")

Plots: QQ plots
> library(statmod) > qqnorm(qresid(dep.opt))

Normal Q!Q Plot


25
!

cooks.distance(dep.opt)

20

Sample Quantiles

15

10

! ! !

!1

! !

5 Index

!1.5

!1.0

!0.5

0.0

0.5

1.0

1.5

Theoretical Quantiles

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Typing plot( glm.object ) produces six plots, four by default:


1 2 3 4 5 6

Residuals ri vs tted values (default) |ri | vs (default) a QQ plot (default) A plot of Cooks distance Di A plot of ri vs hi with contours of equal Di (default) A plot of Di vs hi /(1 hi ), with contours of equal Di

> par(mfrow = c(2, 2)) > plot(dep.opt) > par(mfrow = c(1, 1))

!1

Std. deviance resid.

Residuals vs Fitted
1 2 3 Residuals

Normal Q!Q
1!

! ! ! !

! !

! ! 63 !

!6

!1

!3

!3

!1

1 2 3 4 5 Predicted values

!1.5

!0.5

0.5

1.5

Theoretical Quantiles

Std. deviance resid.

3!
!1

Std. deviance resid.

Scale!Location
1.0
6!
! ! ! ! !

Residuals vs Leverage
!

! !6

8! ! ! 0.5 1 0 1.5

0.0

!4

Cook's distance !
3

!1

1 2 3 4 5 Predicted values

0.0

0.4 Leverage

0.8

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

> plot(dep.opt, which = 5)

Example
Residuals vs Leverage
!

Example

Prop. ssures

Std. deviance resid.

8!
!

! ! !6

!2

!4

Cook's distance 0.0 0.2 0.4 0.6 0.8

3!

Leverage glm(Counts ~ D * S + C)

400 1000 1400 1800 2200 2600

39 53 33 73 30 39

0 4 2 7 5 9

0.00 0.08 0.06 0.10 0.17 0.23

3000 3400 3800 4200 4600

Hours

Hours

1 0.5 0.5 1

42 9 0.21 13 6 0.46 34 22 0.65 40 21 0.53 36 21 0.58

The data are proportions: use binomial family

Prop. ssures

No. turbines

No. turbines

No. ssures

No. ssures

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Example

Example
Three ways to t binomial glms in R; here are two:
1
!

0.6 Proportion of turbines with fissures 0.5


!

td.glm <- glm( prop ~ Hours, weights=Turbines, family=binomial(link=logit) )


2

0.4 0.3 0.2 0.1 0.0


! !
!

td.glm <- glm( cbind(Fissures, Turbines) ~ Hours, family=binomial(link=logit) )

Can use alternative links:


!

td.glm <- glm( prop ~ Hours, weights=Turbines, family=binomial(link=probit) ) td.glm <- glm( prop ~ Hours, weights=Turbines, family=binomial(link=cloglog) )

1000

2000

3000

4000

Hours of use
Regression-type models Examples Using R R examples

We use the default logit link


Regression-type models Examples Using R R examples

Example
The tted model is:
> summary(td.glm) Call: glm(formula = prop ~ Hours, family = binomial(link = logit), weights = Turbines) Deviance Residuals: Min 1Q Median -1.5055 -0.7647 -0.3036 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.9235966 0.3779589 -10.381 <2e-16 *** Hours 0.0009992 0.0001142 8.754 <2e-16 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 112.670 Residual deviance: 10.331 AIC: 49.808 on 10 on 9 degrees of freedom degrees of freedom

Example
> td.cf <- signif(coef(td.glm), 3) > td.cf (Intercept) -3.920000 Hours 0.000999

From R output, the tted model is log i 1 i = 3.92 + 0.000999 Hours

3Q 0.4901

Max 2.0943

where is the expected proportion of turbines with ssures

Number of Fisher Scoring iterations: 4

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Plots: Hat diagonals


> plot(hatvalues(td.glm), type = "h", lwd = 2, col = "blue")

Plots: Cooks distance


> plot(cooks.distance(td.glm), type = "h", lwd = 2, + col = "blue")

0.05 0.10 0.15 0.20 0.25 0.30 0.35

hatvalues(td.glm)

cooks.distance(td.glm) 2 4 6 Index 8 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

6 Index

10

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Plots: QQ plots
> qqnorm(qresid(td.glm))

Example
Example
Normal Q!Q Plot

F
!

H P C P C 13 2879 6 1083 15 923 10 834 12 634 2 782

K P C 5 7 10 14 8 7 4 3142 8 1050 7 895 11 702 9 535 12 659

V P 2520 878 839 631 539 619

1.5

2.0

Age 4054 5559 6064 6569 7074 74+

! ! ! ! ! ! !

! !

11 3059 11 800 11 710 10 581 11 509 10 605

Sample Quantiles

!1.0

0.0 !1.5

0.5

1.0

!1.0

!0.5

0.0

0.5

1.0

1.5

Theoretical Quantiles

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Plots: Number of cancers

Rates
Number of lung cancer patients is a count, so use a Poisson glm:

14

14

glm( Cases ~ City + Age, family=poisson(link=log) )

12 No. Lung cancers No. Lung cancers

12

10

10

2 40!54 55!59 60!64 65!69 70!74 >74

2 Fredericia Horsens Kolding Vejle

But lung cancer rate probably more useful Expected cancer rate is E[Yi /Ti ] = E[Yi ]/Ti = /Ti , where i is the expected number of cancers, Note Ti is known and not random. Using a logarithmic link, model the cancer rate as log(i /Ti ) = X or log i = log Ti + X log Ti is an oset: a component of the linear predictor with a known parameter value, here one.

Age group

City

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Plots: Number of cancers

Plots: Rates of cancer

14

14

0.020

0.020

12 No. Lung cancers No. Lung cancers

12 Lung cancer rate Lung cancer rate 0.015 0.015

10

10

0.010

0.010

6 0.005 0.005
!

2 >74 40!54 55!59 60!64 65!69 70!74

2 Vejle Kolding Horsens Fredericia Fredericia Horsens Kolding City 40!54 55!59 60!64 65!69 70!74 Vejle >74

Age group

City

Age group

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Rates
To model lung cancer rate, use a Poisson glm with an oset:
lc.glm <- glm( Cases ~ offset( log(Population)) + City + Age, family=poisson(link=log) )

Plots: Hat diagonals


> plot(hatvalues(lc.glm), type = "h", lwd = 2, col = "blue")

0.32 0.34 0.36 0.38 0.40 0.42 0.44

hatvalues(lc.glm)

10 Index

15

20

Regression-type models

Examples

Using R

R examples

Regression-type models

Examples

Using R

R examples

Plots: Cooks distance


> plot(cooks.distance(lc.glm), type = "h", lwd = 2, + col = "blue")

Plots: QQ plots
> library(statmod) > qqnorm(qresid(lc.glm))

Normal Q!Q Plot


0.5 2
! !

0.4

cooks.distance(lc.glm)

! ! ! !!

Sample Quantiles

0.3

!! ! ! !!

!!!!

0.2

!1

! !

0.1

0.0

!2
!

10 Index

15

20

!2

!1

Theoretical Quantiles

Regression-type models

Examples

Using R

R examples

Other models
We haved looked at tting glms to
Proportions Counts Rates

Can also t glms to


Positive continuous data (family=gamma or family=inverse.gaussian) Overdispersed counts (family=quasipoisson) Overdispersed proportions (family=quasibinomial) Positive continuous data with exact zeros (family=tweedie using package statmod)

S-ar putea să vă placă și