Documente Academic
Documente Profesional
Documente Cultură
RStudio
R Pruim
JSM 2014
Peering Around the Bend
Is There a Light at the End of the Tunnel?
Or are we adrift at sea?
Some Important Questions . . .
Some Important Questions . . .
. . . that I’m (mostly) not going to answer
Some Important Questions . . .
. . . that I’m (mostly) not going to answer
substance p.value
Flipping 6 coins [ Prob(Heads) = 0.5 ] ... sex alcohol cocaine heroin 1.932e-30
female 36 41 30
T H T H H T male 141 111 94 confint(t.test(~age, data = HELPrct))
Number of Heads: 3 [Proportion Heads: 0.5] mean(age ~ sex, data = HELPrct) mean of x lower upper level
35.65 34.94 36.37 0.95
do(2) * rflip(6) female male
36.25 35.47 model <- lm(weight ~ height + gender,
n heads tails prop data=Heightweight)
1 6 3 3 0.5 diffmean(age ~ sex, data = HELPrct)
wt <- makeFun(model)
2 6 3 3 0.5 wt( height=72, gender="male")
diffmean
coins <- do(1000) * rflip(6) -0.7841
1
tally(~heads, data = coins) 179.1
favstats(age ~ sex, data = HELPrct)
xyplot(weight ~ height, groups=gender,
0 1 2 3 4 5 6 .group min Q1 median Q3 max mean
data=Heightweight)
17 98 243 346 194 91 11 1 female 21 31 35 40.5 58 36.25
plotFun(wt(h,gender="male") ~ h,
2 male 19 30 35 40.0 60 35.47
add=TRUE, col="skyblue")
tally(~heads, data = coins, format = "perc") sd n missing
plotFun(wt(h,gender="female") ~ h,
1 7.585 107 0
add=TRUE, col="navy")
2 7.750 346 0
0 1 2 3 4 5 6 densityplot(~age | sex, groups = substance,
1.7 9.8 24.3 34.6 19.4 9.1 1.1 data = HELPrct, auto.key = TRUE)
tally(~(heads >= 5 | heads <= 1), data = coins)
alcohol 200
weight
cocaine
heroin 150
10 20 30 40 50 60 70
TRUE FALSE 100
0.06 55 60 65 70 75
0.04
0.02
0.00 height
histogram(~heads, data = coins, width = 1, 10 20 30 40 50 60 70
groups = (heads >= 5 | heads <= 1)) age plotDist("chisq", df = 4)
bwplot(age ~ substance | sex, data = HELPrct)
female male
Density
0.3 60 0.15
0.2 50 0.10
age
0.1 40 0.05
0.0 30 0.00
20
0 2 4 6
alcoholcocaine heroin alcoholcocaine heroin 0 5 10 15 20
heads
The Most Important Template
The Most Important Template
The Most Important Template
Other versions:
# simpler version
goal( ~ x, data = mydata )
# fancier version
goal( y ~ x | z , data = mydata )
# unified version
goal( formula , data = mydata )
2 Questions
What do you want R to do? (goal)
10000
births
9000
8000
7000
Jan Apr Jul Oct Jan
date
Example: How do we tell R to make this plot?
10000
births
9000
8000
7000
Jan Apr Jul Oct Jan
date
•
•
Example: How do we tell R to make this plot?
10000
births
9000
8000
7000
Jan Apr Jul Oct Jan
date
• a scatter plot
10000
births
9000
8000
7000
Jan Apr Jul Oct Jan
date
10000
births
9000
8000
7000
Jan Apr Jul Oct Jan
date
Your turn: How do you make this plot?
60
50
age
40
30
20
alcohol cocaine heroin
substance
Two Questions?
Your turn: How do you make this plot?
bwplot( age ~ substance, data=HELPrct)
60
50
age
40
30
20
alcohol cocaine heroin
Your turn: How about this one?
heroin
cocaine
alcohol
20 30 40 50 60
age
Your turn: How about this one?
bwplot( substance ~ age, data=HELPrct )
heroin
cocaine
alcohol
20 30 40 50 60
age
Graphical Summaries: One Variable
histogram( ~ age, data=HELPrct)
0.06
0.05
0.04
Density
0.03
0.02
0.01
0.00
20 30 40 50 60
age
Two Variables
alcohol
cocaine
heroin
10 20 30 40 50 60 70
female male
0.06
Density
0.04
0.02
0.00
10 20 30 40 50 60 70
age
Bells & Whistles
Lots available
• titles
• axis labels
• colors
• sizes
• transparency
• etc, etc.
My approach:
10000
births
9000
8000
7000
Jan Apr Jul Oct Jan
date
Numerical Summaries: One Variable
Big idea: Replace plot name with summary name
[1] 35.65
0.06
0.05
0.04
Density
0.03
0.02
0.01
0.00
20 30 40 50 60
age
Other Summaries
The mosaic package includes formula aware versions of mean(),
sd(), var(), min(), max(), sum(), IQR(), . . .
Also provides favstats() to compute our favorites.
substance
sex alcohol cocaine heroin
female 36 41 30
male 141 111 94
One Template to Rule a Lot
• single and multiple variable graphical summaries
• single and multiple variabble numerical summaries
• linear models
female male
36.25 35.47
(Intercept) sexmale
36.2523 -0.7841
Modeling
Modeling is really the starting point for the mosaic design.
• Co-conspirators