Sunteți pe pagina 1din 83

icebreakeR: An Introduction to R

Andrew Robinson
Department of Mathematics & Statistics University of Melbourne

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

1 / 78

Outline
1 2 3 4 5 6 7 8 9 10 11 12

Introduction Infrastructure Showcase Interface Conclusion Classes and Objects Object Types Programming Descriptions Graphics Linear Models Mixed-Effects Models
icebreakeR: An Introduction to R 2 / 78

Andrew Robinson (University of Melbourne)

Introduction

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

3 / 78

What is R?

R is a programming language that has been optimized for data analysis and modeling. R can be used as an object-oriented programming language, or as a statistical environment within which sets of instructions can be performed automatically.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

4 / 78

Why use R?

1 2

R runs on Windows, Mac-OS, and Unix variants; R provides a vast number of useful statistical tools;
1

many of which have been painstakingly tested;

3 4 5 6 7 8

R produces publication-quality graphics in a variety of formats;


A R plays well with L TEX via the Sweave package; R plays well with FORTRAN, C, and shell scripts;

R scales, making it useful for small and large projects; R is object-oriented; R eschews the GUI.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

5 / 78

Why avoid R?

Frustration!
1 2 3 4 5

R cannot do everything; R will not hold your hand; The documentation can be opaque; R can drive you crazy, or age you prematurely; The contributed packages have been exposed to varying degrees of testing and analysis; R stores objects in RAM; R eschews the GUI.

6 7

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

6 / 78

A contrast

1 2

R is object-oriented; SAS is PROCedure-oriented;

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

7 / 78

Infrastructure

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

8 / 78

Organization

File structure: project


data, documents, graphics, images, notes, scripts.

Exercise 1

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

9 / 78

Getting going and stopping


Getting Going
Starting R Adding Packages

Exercise 2

Stopping
Cancelling operations Quitting

Exercise 3

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

10 / 78

Working Directory

Where it all begins.

Relevant commands
getwd() setwd("new working directory ")

Exercise 4

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

11 / 78

Showcase

Work through the exercise in the Showcase chapter. Reect on the commonalities between the exercise and what you need from R.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

12 / 78

Getting Local Help

R comes with internal help. Use the examples.

Relevant commands
help(object ) ?object help.search("phrase ") help.start()

Exercise 5

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

13 / 78

Remote Help

The Internet is a vast repository of advice. Most of it is good.

Relevant commands
RSiteSearch() Google
R-help . . .

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

14 / 78

Responsive Help

A list-server exists. Information is available at: https://stat.ethz.ch/mailman/listinfo/r-help

Relevant issues
Post questions as a last resort. http://www.r-project.org/posting-guide.html

Exercise 6

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

15 / 78

Writing Scripts

Why write scripts?

Relevant issues
source("script-name.R ", echo=TRUE) Comments: #

Exercise 7

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

16 / 78

Work Spaces

The workspace is the container of all your objects.

Relevant commands
ls() rm(object ) rm(list=ls()) save.image() save(object ) load()

Exercise 8

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

17 / 78

History

The past.

Relevant commands
savehistory() loadhistory()

Exercise 9

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

18 / 78

Extensibility

R starts speedily.

Relevant commands
require(package ) installed.packages() available.packages() install.packages(package )

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

19 / 78

Interface

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

20 / 78

Import and Export

Import
read.xxx ()

Exercise 10

Export
write.xxx () pdf("pdf-name.pdf ") . . . dev.off()

Exercise 11

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

21 / 78

R is hard work

Work hard!

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

22 / 78

R is hard work

Work hard!

That means:
Read widely. Use the resources available. Experiment patiently and exibly. Keep scripts. Comment generously.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

22 / 78

What is an Object?

Characteristics Everything. Objects are realizations of a class. All objects have attributes.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

23 / 78

Why use Objects?

Using objects simplies many complicated problems.


1 2 3

Communication. Comparison. Coercion.

But you have to put them somewhere!

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

24 / 78

Assignment

Relevant commands
name <- definition class(object ) Every object that is followed by () is a function, being called. Every object that is followed by [] is being sub- or super-setted.

Exercise 12

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

25 / 78

Object Types

Atomistic
Numeric. String. Factor. Integer. Logical. Missing.

Exercise 13

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

26 / 78

Object Types

Containers
Vector
Vectorization

Exercise 14
Dataframe.

Exercise 15
Matrix. Array. List.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

27 / 78

Messing with Data

merge() reshape() order()

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

28 / 78

Programming

Write your own functions. Flow control Scoping Class control

Exercise 16

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

29 / 78

Data Descriptions

Simple data descriptions are readily available. Univariate


Numerical Categorical

Multivariate
Numerical/Numerical Numerical/Categorical Categorical/Categorical

Exercise 17

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

30 / 78

Graphics

All the little pieces . . .


plot(x, y, . . .) xlim=, ylim= xlab=, ylab= main= col= pch=, lty=, . . .

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

31 / 78

Graphics: Organization

par(. . .) las=1 mfrow=c(2,2) mar=c(4,4,3,2) new=TRUE

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

32 / 78

Graphics: Augmentation

plot(x, y, . . .) points() axis() mtext() box() legend()

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

33 / 78

Graphics: Permanance

Right-click the graphic, copy as windows metale, and paste to a document. Or . . . pdf("pdf-name.pdf ") plot(x, y, . . .) dev.off()

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

34 / 78

Graphics: Challenges

Error Bars Colour by groups

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

35 / 78

Graphics: Contributions

trellis (lattice package) grammar of graphics (ggplot, ggplot2 packages)

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

36 / 78

Graphics: Exercises

Exercise 18 Exercise 19

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

37 / 78

Linear Models

name <- lm(y x )

Useful Options
data = dataframe na.action = na.exclude

Occasional Options
subset = logical or index weights = weights formula = terms(y x, keep.order = TRUE)

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

38 / 78

Diagnostics Code

plot(model )

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

39 / 78

Diagnostics Interpretation

10 0 10 20 30

Residuals

q 117 q q qq qq qqq q q q q qq q q q q q q q q q q q q qq q q q q q q q qq qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqq q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q qq q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q

Standardized residuals

Residuals vs Fitted

Normal QQ
2 0 2 4 6
qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

117 q

376 q 415 q

q 376 q 415

15 Standardized residuals

20

25

30

35

40

45 Standardized residuals

ScaleLocation Fitted values


2.5 2.0 1.5 1.0 0.5 0.0
415 q 376 q
q q 117 q q q qq q q q q q q q q q qq q q q q q qq qq q q q q q q q q q q q qq q q q q q qq q q q q q q q q qq qq q q q q qq q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q qq q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q qq q q q q q q q q q q q q q qq q q q qq q qq q q q q qq q q qq q q q q q q q qq qq qq q q qq q qq q q q

Residuals vs Leverage Theoretical Quantiles


2 0 2 4 6 8
q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q 376 q 415 q q q q

245 q 0.5

Cook's distance 0.02 0.03 0.04

15

20

25

30

35

40

45

0.00

0.01

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

40 / 78

Estimation 1
> summary(hd.lm) Call: lm(formula = height.m ~ dbh.cm, data = ufc) Residuals: Min 1Q -33.5257 -2.8619

Median 0.1320

3Q 2.8512

Max 13.3206

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.67570 0.56406 22.47 <2e-16 *** dbh.cm 0.31259 0.01388 22.52 <2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1

' '

Residual standard error: 4.941 on 389 degrees of freedom Multiple R-Squared: 0.566, Adjusted R-squared: 0.5649 F-statistic: 507.4 on 1 and 389 DF, p-value: < 2.2e-16
Andrew Robinson (University of Melbourne) icebreakeR: An Introduction to R 42 / 78

Prediction

predict(model, newdata=new-dataframe, . . .)

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

43 / 78

Inference

> anova(hd.lm) Analysis of Variance Table Response: height.m Df Sum Sq Mean Sq F value Pr(>F) dbh.cm 1 12388.1 12388.1 507.38 < 2.2e-16 *** Residuals 389 9497.7 24.4 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1

' '

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

45 / 78

Exercise

Exercise 20

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

46 / 78

Characteristics

Mixed-eects models incorporate two kinds of predictor variables. Fixed eects - speak for themselves. Random eects - represent a population.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

47 / 78

Necessity
Natural resources data commonly have hierarchical structure. Trees within plots within stands within forests. Times within trees . . . Mixed-eects models enable the modeling of correlated data without violation of important regression assumptions.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

48 / 78

Necessity
Natural resources data commonly have hierarchical structure. Trees within plots within stands within forests. Times within trees . . . Mixed-eects models enable the modeling of correlated data without violation of important regression assumptions.

Regression assumptions.
True relationship is linear. Residuals are normally distributed. Residuals have identical distribution (variance). Residuals are independent.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

48 / 78

Utility

Mixed eects models allow the estimation of useful quantities. Variance components. Intra-class correlation.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

49 / 78

Yet another perspective

Construct a height-diameter relationship using two randomly selected plots in a forest, and that we have measured three trees on each. Growing conditions are quite dierent on the plots, leading to a systematic dierence between the height-diameter relationship on each. If we t a simple regression to the trees then we obtain a residual/tted value plot. If we t a simple regression to the trees with an intercept for each plot then we obtain a residual/tted value plot.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

50 / 78

50

0.5

q q

45

q q

40

Height (m)

Residuals

Residuals
q

0.5

35

0.0

30

q q q

25

29

30

31

32

33

34

35

36

30

35 Fitted Values

40

45

1.0

25

30

35

40

45

50

Diameter (cm)

Fitted Values

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

51 / 78

Decomposition 1

Note that the model specication implies that: yij y ij = ij and The true relationship is linear.
i

(1)

N (0, 2 )
i

The

are independent.

Clearly not true.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

52 / 78

Decomposition 2

What if we could make: i + ij yij y ij = b Then we merely need to assume that: The true relationship is linear.
2) bi N (0, b ij

(2)

N (0, 2 )
ij

The

are independent.

Much more tenable!

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

53 / 78

Decomposition 3

The assumptions are satised because the systematic dierences between the plots, which previously produced correlation, are now accounted for by the new random eects. However, when the time comes to use the model for prediction, we do not need to know the plot identity, as the xed eects do not require it.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

54 / 78

Data - Height/Diameter from Stage (1963)

A brief synopsis: a sample of 66 trees was selected in national forests around northern and central Idaho. According to Stage (pers. comm. 2003), the trees were selected purposively. The habitat type and diameter at 46 were also recorded for each tree, as was the national forest from which it came. Each tree was then split, and decadal measures were made of height and diameter inside bark at 46 .

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

55 / 78

Scatterplot
60 50 Height (m) 40 30 20 10 0
q q q q

q q q q q q q qq q q q q q qq q q qq q q q q q q qqqq q q qq q q q q q q q qq q q qq q q q q q q q q q q qq q qq q qq q qq qq q q q q q q q q q q q q qq q q q q q qq q q q qq q q q q qq qq q q qq q q qq qq qqq qq q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qqq q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

20

40 Dbhib (cm)

60

80

Figure: Al Stages Grand Fir stem analysis data: height (m) against diameter (cm). These were dominant and co-dominant trees.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

56 / 78

Another look
Kaniksu
60 60 Ts/Pac Ts/Op

Coeur d'Alene
AG/Pach Ts/Pac PA/Pach 60

St. Joe
Ts/Pac PA/Pach Th/Pach

40

40

20

20

20

40

60

80

20

40

60

80

0 0

20

40

20

40

60

80

Clearwater
60 60 PA/Pach Th/Pach

Nez Perce
Th/Pach AG/Pach PA/Pach 60

Clark Fork
PA/Pach Ts/Pac

40

40

20

20

20

40

60

80

20

40

60

80

0 0

20

40

20

40

60

80

Figure: Al Stages Grand Fir Stem Analysis Data: height (ft, vertical axes) against diameter (inches, horizontal axes) by National Forest. These were dominant and co-dominant trees.
Andrew Robinson (University of Melbourne) icebreakeR: An Introduction to R 57 / 78

Model for getting it wrong in R

hi = 0 + 1 di +

(3)

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

58 / 78

Model for getting it wrong in R

hi = 0 + 1 di +

(3)

Regression assumptions.
True relationship is linear.
i

N (0, 2 )

Cov ( i , j ) = 0 for i = j

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

58 / 78

Diagnostics for getting it wrong in R


Standardized residuals

Residuals vs Fitted
155 q 156 q 157 q q q q q q qq q q q q qqq q q q qqq q q q q q qqq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q qq q q q q qq qq q q q q q q qq qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q qq q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q qq q qq qq q q qq q qq q q q q q q q qq q q q q q q q q q qqq q q q qqq q qqq qq q q q qq

Normal QQ
4 2 0 2
155 q 156 157 qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq

Residuals

10

10

10 Standardized residuals

20

30

40

50

60 Standardized residuals

0.0

155 q 156 q 157 q q q q q qq q qq qq q qq q q q q q q q q q q q q q qqq qqq q qqq q q qqq q q q qq q q q q q q qq q qq q q q q q qq q q q q q q qq q q q qq q q q q q q q q qq q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q qq qq q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q qq q q qq q qq q qq q qq q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q qq q q q qq q q q q qq q qq q q q q q q q q q qq q q q qq q q q

ScaleLocation Fitted values

Residuals vs Leverage Theoretical Quantiles


4
q q q q q q q q q q qq qq q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q qq q q q q q qq qq q q q q q 468 245 469 q q

1.0

Cook's distance 0.010

10

20

30

40

50

60

0.000

0.005

0.015

0.020

Figure: Popular regression diagnostics from R.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

59 / 78

Model for getting it less wrong in R

hit = 0 + (1 + b1i ) dit +

it

(4)

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

60 / 78

Model for getting it less wrong in R

hit = 0 + (1 + b1i ) dit +

it

(4)

Regression assumptions.
True relationship is linear.
2 ) b1i N (0, b 1 it

N (0, 2 )
jt ) ig )

Cov ( it , Cov ( it ,

= 0 for i = j = 0 for t = g

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

60 / 78

Assumptions for getting it less wrong in R

Now, the key assumptions that were making are that:


1 2 3

the model structure is correctly specied the tree and forest random eects are normally distributed, the tree random eects are homoscedastic within the forest random eects. the inner-most residuals are normally distributed, the inner-most residuals are homoscedastic within and across the tree random eects. the innermost residuals are independent within the groups.

4 5

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

61 / 78

Diagnostics for getting it less wrong in R

Model Structure (I)


Innermost Residuals
60
q q qq q q qq q q q q qq q qq q qq q q qq q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq qqq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

Model Structure (II)


4 2 0 2
q q q q

Observed Values

50 40 30 20 10 0

q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q qq q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q qq q q q qq q q q q

10 20 30 40 50 60

10

30

50

70

Fitted Values

Fitted Values

Figure: Useful regression diagnostics from R.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

62 / 78

Diagnostics for getting it less wrong in R

QQ plot: Tree
q

QQ plot: Residuals
q

Sample Quantiles

1 0 1 2
q

q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq

Sample Quantiles

4 2 0 2
q

q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

Theoretical Quantiles

Theoretical Quantiles

Figure: More useful regression diagnostics from R.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

63 / 78

Diagnostics for getting it less wrong in R


2.1291 0.5629 0.1856 1.2749

Variance of withinTree Residuals

Innermost Residuals

q q q q

4
q q q

0
q q

0 2

q q q q q qq q q q q qq q q q q q qq q q q q qqqq q q q qq q q qq q q q q q q q q q qqq q q qq q q q

71 12 65 70 36 18 15

Figure: More useful regression diagnostics from R.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

64 / 78

Diagnostics for getting it less wrong in R

Autocorrelation
1.0 0.5 0.0
q q q q q

Correlation

0.5 1.0 0 2 4

q q q q q q

10

Lag

Figure: More useful regression diagnostics from R.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

65 / 78

The roles differ

For the design,


xed eects represent themselves; random eects represent a population.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

66 / 78

The roles differ

Within the model,


xed eects explain variation; random eects organize unexplained variation.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

67 / 78

Another perspective

Random eects are eects that common sense says will explain variation, but you dont want to have to know them in order to be able to apply the model.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

68 / 78

Modelling is much more involved

Add a new dimension to your ow chart!

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

69 / 78

A Modelling Strategy

The modeling strategy depends on the modelers intention.


1

Fit baseline model.


1 2

Include the meaningful xed eects. Include the design random eects.

2 3

Check the assumption diagnostics. Add or modify random components until diagnostics are satised.
1 2 3

a heteroskedastic variance structure (several candidates) a correlation structure (several candidates) extra random eects (e.g. random slopes)

4 5

Consider adding more xed eects. Re-examine the diagnostics, add/modify random eects, etc.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

70 / 78

Basic Model Statement


Y = X + Zb +

b N (0, D) N (0, R)

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

71 / 78

Basic Model Statement


Y = X + Zb +

b N (0, D) N (0, R)

Design Matrices
X allocates the xed eects. Z allocates the random eects.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

71 / 78

Basic Model Statement


Y = X + Zb +

b N (0, D) N (0, R)

Design Matrices
X allocates the xed eects. Z allocates the random eects.

Covariance Matrices
D describes the random eects covariance. R allocates the residuals covariance.
Andrew Robinson (University of Melbourne) icebreakeR: An Introduction to R 71 / 78

Y = X + Zb +

Var (Y | X, Z, , b) = R

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

72 / 78

Y = X + Zb +

Var (Y | X, Z, , b) = R Var (Y | X, ) = ZDZ + R = V

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

72 / 78

Log Likelihood

1 n 1 L (, V | Y, X) = ln (|V|) ln (2 ) (Y X ) V1 (Y X ) 2 2 2

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

73 / 78

Profile out

= (X V1 X)1 X V1 Y

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

74 / 78

is gone!

L (, V | Y, X) = f (V, Y, X, Z, D, R)

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

75 / 78

is gone!

by substitution. by maximization and then Estimate V

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

76 / 78

ReML

Maximum likelihood estimators of covariance parameters are usually negatively biased.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

77 / 78

ReML

Briey, ReML involves applying ML, but replacing Y with KY; X with 0; Z with K Z; and V with K VK where K is such that K X = 0.

Andrew Robinson (University of Melbourne)

icebreakeR: An Introduction to R

78 / 78

S-ar putea să vă placă și