Documente Academic
Documente Profesional
Documente Cultură
Andrew Robinson
Department of Mathematics & Statistics University of Melbourne
icebreakeR: An Introduction to R
1 / 78
Outline
1 2 3 4 5 6 7 8 9 10 11 12
Introduction Infrastructure Showcase Interface Conclusion Classes and Objects Object Types Programming Descriptions Graphics Linear Models Mixed-Effects Models
icebreakeR: An Introduction to R 2 / 78
Introduction
icebreakeR: An Introduction to R
3 / 78
What is R?
R is a programming language that has been optimized for data analysis and modeling. R can be used as an object-oriented programming language, or as a statistical environment within which sets of instructions can be performed automatically.
icebreakeR: An Introduction to R
4 / 78
Why use R?
1 2
R runs on Windows, Mac-OS, and Unix variants; R provides a vast number of useful statistical tools;
1
3 4 5 6 7 8
R scales, making it useful for small and large projects; R is object-oriented; R eschews the GUI.
icebreakeR: An Introduction to R
5 / 78
Why avoid R?
Frustration!
1 2 3 4 5
R cannot do everything; R will not hold your hand; The documentation can be opaque; R can drive you crazy, or age you prematurely; The contributed packages have been exposed to varying degrees of testing and analysis; R stores objects in RAM; R eschews the GUI.
6 7
icebreakeR: An Introduction to R
6 / 78
A contrast
1 2
icebreakeR: An Introduction to R
7 / 78
Infrastructure
icebreakeR: An Introduction to R
8 / 78
Organization
Exercise 1
icebreakeR: An Introduction to R
9 / 78
Exercise 2
Stopping
Cancelling operations Quitting
Exercise 3
icebreakeR: An Introduction to R
10 / 78
Working Directory
Relevant commands
getwd() setwd("new working directory ")
Exercise 4
icebreakeR: An Introduction to R
11 / 78
Showcase
Work through the exercise in the Showcase chapter. Reect on the commonalities between the exercise and what you need from R.
icebreakeR: An Introduction to R
12 / 78
Relevant commands
help(object ) ?object help.search("phrase ") help.start()
Exercise 5
icebreakeR: An Introduction to R
13 / 78
Remote Help
Relevant commands
RSiteSearch() Google
R-help . . .
icebreakeR: An Introduction to R
14 / 78
Responsive Help
Relevant issues
Post questions as a last resort. http://www.r-project.org/posting-guide.html
Exercise 6
icebreakeR: An Introduction to R
15 / 78
Writing Scripts
Relevant issues
source("script-name.R ", echo=TRUE) Comments: #
Exercise 7
icebreakeR: An Introduction to R
16 / 78
Work Spaces
Relevant commands
ls() rm(object ) rm(list=ls()) save.image() save(object ) load()
Exercise 8
icebreakeR: An Introduction to R
17 / 78
History
The past.
Relevant commands
savehistory() loadhistory()
Exercise 9
icebreakeR: An Introduction to R
18 / 78
Extensibility
R starts speedily.
Relevant commands
require(package ) installed.packages() available.packages() install.packages(package )
icebreakeR: An Introduction to R
19 / 78
Interface
icebreakeR: An Introduction to R
20 / 78
Import
read.xxx ()
Exercise 10
Export
write.xxx () pdf("pdf-name.pdf ") . . . dev.off()
Exercise 11
icebreakeR: An Introduction to R
21 / 78
R is hard work
Work hard!
icebreakeR: An Introduction to R
22 / 78
R is hard work
Work hard!
That means:
Read widely. Use the resources available. Experiment patiently and exibly. Keep scripts. Comment generously.
icebreakeR: An Introduction to R
22 / 78
What is an Object?
Characteristics Everything. Objects are realizations of a class. All objects have attributes.
icebreakeR: An Introduction to R
23 / 78
icebreakeR: An Introduction to R
24 / 78
Assignment
Relevant commands
name <- definition class(object ) Every object that is followed by () is a function, being called. Every object that is followed by [] is being sub- or super-setted.
Exercise 12
icebreakeR: An Introduction to R
25 / 78
Object Types
Atomistic
Numeric. String. Factor. Integer. Logical. Missing.
Exercise 13
icebreakeR: An Introduction to R
26 / 78
Object Types
Containers
Vector
Vectorization
Exercise 14
Dataframe.
Exercise 15
Matrix. Array. List.
icebreakeR: An Introduction to R
27 / 78
icebreakeR: An Introduction to R
28 / 78
Programming
Exercise 16
icebreakeR: An Introduction to R
29 / 78
Data Descriptions
Multivariate
Numerical/Numerical Numerical/Categorical Categorical/Categorical
Exercise 17
icebreakeR: An Introduction to R
30 / 78
Graphics
icebreakeR: An Introduction to R
31 / 78
Graphics: Organization
icebreakeR: An Introduction to R
32 / 78
Graphics: Augmentation
icebreakeR: An Introduction to R
33 / 78
Graphics: Permanance
Right-click the graphic, copy as windows metale, and paste to a document. Or . . . pdf("pdf-name.pdf ") plot(x, y, . . .) dev.off()
icebreakeR: An Introduction to R
34 / 78
Graphics: Challenges
icebreakeR: An Introduction to R
35 / 78
Graphics: Contributions
icebreakeR: An Introduction to R
36 / 78
Graphics: Exercises
Exercise 18 Exercise 19
icebreakeR: An Introduction to R
37 / 78
Linear Models
Useful Options
data = dataframe na.action = na.exclude
Occasional Options
subset = logical or index weights = weights formula = terms(y x, keep.order = TRUE)
icebreakeR: An Introduction to R
38 / 78
Diagnostics Code
plot(model )
icebreakeR: An Introduction to R
39 / 78
Diagnostics Interpretation
10 0 10 20 30
Residuals
Standardized residuals
Residuals vs Fitted
Normal QQ
2 0 2 4 6
qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
117 q
376 q 415 q
q 376 q 415
15 Standardized residuals
20
25
30
35
40
45 Standardized residuals
245 q 0.5
15
20
25
30
35
40
45
0.00
0.01
icebreakeR: An Introduction to R
40 / 78
Estimation 1
> summary(hd.lm) Call: lm(formula = height.m ~ dbh.cm, data = ufc) Residuals: Min 1Q -33.5257 -2.8619
Median 0.1320
3Q 2.8512
Max 13.3206
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.67570 0.56406 22.47 <2e-16 *** dbh.cm 0.31259 0.01388 22.52 <2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' '
Residual standard error: 4.941 on 389 degrees of freedom Multiple R-Squared: 0.566, Adjusted R-squared: 0.5649 F-statistic: 507.4 on 1 and 389 DF, p-value: < 2.2e-16
Andrew Robinson (University of Melbourne) icebreakeR: An Introduction to R 42 / 78
Prediction
predict(model, newdata=new-dataframe, . . .)
icebreakeR: An Introduction to R
43 / 78
Inference
> anova(hd.lm) Analysis of Variance Table Response: height.m Df Sum Sq Mean Sq F value Pr(>F) dbh.cm 1 12388.1 12388.1 507.38 < 2.2e-16 *** Residuals 389 9497.7 24.4 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' '
icebreakeR: An Introduction to R
45 / 78
Exercise
Exercise 20
icebreakeR: An Introduction to R
46 / 78
Characteristics
Mixed-eects models incorporate two kinds of predictor variables. Fixed eects - speak for themselves. Random eects - represent a population.
icebreakeR: An Introduction to R
47 / 78
Necessity
Natural resources data commonly have hierarchical structure. Trees within plots within stands within forests. Times within trees . . . Mixed-eects models enable the modeling of correlated data without violation of important regression assumptions.
icebreakeR: An Introduction to R
48 / 78
Necessity
Natural resources data commonly have hierarchical structure. Trees within plots within stands within forests. Times within trees . . . Mixed-eects models enable the modeling of correlated data without violation of important regression assumptions.
Regression assumptions.
True relationship is linear. Residuals are normally distributed. Residuals have identical distribution (variance). Residuals are independent.
icebreakeR: An Introduction to R
48 / 78
Utility
Mixed eects models allow the estimation of useful quantities. Variance components. Intra-class correlation.
icebreakeR: An Introduction to R
49 / 78
Construct a height-diameter relationship using two randomly selected plots in a forest, and that we have measured three trees on each. Growing conditions are quite dierent on the plots, leading to a systematic dierence between the height-diameter relationship on each. If we t a simple regression to the trees then we obtain a residual/tted value plot. If we t a simple regression to the trees with an intercept for each plot then we obtain a residual/tted value plot.
icebreakeR: An Introduction to R
50 / 78
50
0.5
q q
45
q q
40
Height (m)
Residuals
Residuals
q
0.5
35
0.0
30
q q q
25
29
30
31
32
33
34
35
36
30
35 Fitted Values
40
45
1.0
25
30
35
40
45
50
Diameter (cm)
Fitted Values
icebreakeR: An Introduction to R
51 / 78
Decomposition 1
Note that the model specication implies that: yij y ij = ij and The true relationship is linear.
i
(1)
N (0, 2 )
i
The
are independent.
icebreakeR: An Introduction to R
52 / 78
Decomposition 2
What if we could make: i + ij yij y ij = b Then we merely need to assume that: The true relationship is linear.
2) bi N (0, b ij
(2)
N (0, 2 )
ij
The
are independent.
icebreakeR: An Introduction to R
53 / 78
Decomposition 3
The assumptions are satised because the systematic dierences between the plots, which previously produced correlation, are now accounted for by the new random eects. However, when the time comes to use the model for prediction, we do not need to know the plot identity, as the xed eects do not require it.
icebreakeR: An Introduction to R
54 / 78
A brief synopsis: a sample of 66 trees was selected in national forests around northern and central Idaho. According to Stage (pers. comm. 2003), the trees were selected purposively. The habitat type and diameter at 46 were also recorded for each tree, as was the national forest from which it came. Each tree was then split, and decadal measures were made of height and diameter inside bark at 46 .
icebreakeR: An Introduction to R
55 / 78
Scatterplot
60 50 Height (m) 40 30 20 10 0
q q q q
20
40 Dbhib (cm)
60
80
Figure: Al Stages Grand Fir stem analysis data: height (m) against diameter (cm). These were dominant and co-dominant trees.
icebreakeR: An Introduction to R
56 / 78
Another look
Kaniksu
60 60 Ts/Pac Ts/Op
Coeur d'Alene
AG/Pach Ts/Pac PA/Pach 60
St. Joe
Ts/Pac PA/Pach Th/Pach
40
40
20
20
20
40
60
80
20
40
60
80
0 0
20
40
20
40
60
80
Clearwater
60 60 PA/Pach Th/Pach
Nez Perce
Th/Pach AG/Pach PA/Pach 60
Clark Fork
PA/Pach Ts/Pac
40
40
20
20
20
40
60
80
20
40
60
80
0 0
20
40
20
40
60
80
Figure: Al Stages Grand Fir Stem Analysis Data: height (ft, vertical axes) against diameter (inches, horizontal axes) by National Forest. These were dominant and co-dominant trees.
Andrew Robinson (University of Melbourne) icebreakeR: An Introduction to R 57 / 78
hi = 0 + 1 di +
(3)
icebreakeR: An Introduction to R
58 / 78
hi = 0 + 1 di +
(3)
Regression assumptions.
True relationship is linear.
i
N (0, 2 )
Cov ( i , j ) = 0 for i = j
icebreakeR: An Introduction to R
58 / 78
Residuals vs Fitted
155 q 156 q 157 q q q q q q qq q q q q qqq q q q qqq q q q q q qqq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q qq q q q q qq qq q q q q q q qq qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q qq q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q qq q qq qq q q qq q qq q q q q q q q qq q q q q q q q q q qqq q q q qqq q qqq qq q q q qq
Normal QQ
4 2 0 2
155 q 156 157 qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq
Residuals
10
10
10 Standardized residuals
20
30
40
50
60 Standardized residuals
0.0
1.0
10
20
30
40
50
60
0.000
0.005
0.015
0.020
icebreakeR: An Introduction to R
59 / 78
it
(4)
icebreakeR: An Introduction to R
60 / 78
it
(4)
Regression assumptions.
True relationship is linear.
2 ) b1i N (0, b 1 it
N (0, 2 )
jt ) ig )
Cov ( it , Cov ( it ,
= 0 for i = j = 0 for t = g
icebreakeR: An Introduction to R
60 / 78
the model structure is correctly specied the tree and forest random eects are normally distributed, the tree random eects are homoscedastic within the forest random eects. the inner-most residuals are normally distributed, the inner-most residuals are homoscedastic within and across the tree random eects. the innermost residuals are independent within the groups.
4 5
icebreakeR: An Introduction to R
61 / 78
Observed Values
50 40 30 20 10 0
q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q qq q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q qq q q q qq q q q q
10 20 30 40 50 60
10
30
50
70
Fitted Values
Fitted Values
icebreakeR: An Introduction to R
62 / 78
QQ plot: Tree
q
QQ plot: Residuals
q
Sample Quantiles
1 0 1 2
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq
Sample Quantiles
4 2 0 2
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
Theoretical Quantiles
Theoretical Quantiles
icebreakeR: An Introduction to R
63 / 78
Innermost Residuals
q q q q
4
q q q
0
q q
0 2
q q q q q qq q q q q qq q q q q q qq q q q q qqqq q q q qq q q qq q q q q q q q q q qqq q q qq q q q
71 12 65 70 36 18 15
icebreakeR: An Introduction to R
64 / 78
Autocorrelation
1.0 0.5 0.0
q q q q q
Correlation
0.5 1.0 0 2 4
q q q q q q
10
Lag
icebreakeR: An Introduction to R
65 / 78
icebreakeR: An Introduction to R
66 / 78
icebreakeR: An Introduction to R
67 / 78
Another perspective
Random eects are eects that common sense says will explain variation, but you dont want to have to know them in order to be able to apply the model.
icebreakeR: An Introduction to R
68 / 78
icebreakeR: An Introduction to R
69 / 78
A Modelling Strategy
Include the meaningful xed eects. Include the design random eects.
2 3
Check the assumption diagnostics. Add or modify random components until diagnostics are satised.
1 2 3
a heteroskedastic variance structure (several candidates) a correlation structure (several candidates) extra random eects (e.g. random slopes)
4 5
Consider adding more xed eects. Re-examine the diagnostics, add/modify random eects, etc.
icebreakeR: An Introduction to R
70 / 78
b N (0, D) N (0, R)
icebreakeR: An Introduction to R
71 / 78
b N (0, D) N (0, R)
Design Matrices
X allocates the xed eects. Z allocates the random eects.
icebreakeR: An Introduction to R
71 / 78
b N (0, D) N (0, R)
Design Matrices
X allocates the xed eects. Z allocates the random eects.
Covariance Matrices
D describes the random eects covariance. R allocates the residuals covariance.
Andrew Robinson (University of Melbourne) icebreakeR: An Introduction to R 71 / 78
Y = X + Zb +
Var (Y | X, Z, , b) = R
icebreakeR: An Introduction to R
72 / 78
Y = X + Zb +
icebreakeR: An Introduction to R
72 / 78
Log Likelihood
1 n 1 L (, V | Y, X) = ln (|V|) ln (2 ) (Y X ) V1 (Y X ) 2 2 2
icebreakeR: An Introduction to R
73 / 78
Profile out
= (X V1 X)1 X V1 Y
icebreakeR: An Introduction to R
74 / 78
is gone!
L (, V | Y, X) = f (V, Y, X, Z, D, R)
icebreakeR: An Introduction to R
75 / 78
is gone!
icebreakeR: An Introduction to R
76 / 78
ReML
icebreakeR: An Introduction to R
77 / 78
ReML
Briey, ReML involves applying ML, but replacing Y with KY; X with 0; Z with K Z; and V with K VK where K is such that K X = 0.
icebreakeR: An Introduction to R
78 / 78