Sunteți pe pagina 1din 6

Liao HW 5

1. Firesale
a)
realestate=read.delim("http://sites.williams.edu/rdeveaux/files/2014/09/Real.
Estate.txt")
with(subset(realestate,Fireplace==TRUE),hist(Price))

with(subset(realestate,Fireplace==FALSE),hist(Price))

Both distributions are skewed to the right. In general, houses without a fireplace are lower
in price than houses with a fireplace.
compareMean(Price~Fireplace,data=realestate)
## [1] 65261

The mean of the price of the houses with fireplaces is 65260.61 higher than the mean of the
price of the houses without fireplaces. b) The assumption is not incorrect however there
are other factors or lurking variables such as the loft size, the age of the house, and the
living area etc affecting the price of the house. I agree to a certain extent that installing a
fireplace will increase the price of the house but by how much will it increase depends on
other variables.
difference=do(10000)*compareMean(Price~shuffle(Fireplace),data=realestate)
## Loading required package: parallel
hist(difference$result)
points(65260.61,0,pch=17)

c) Looking at the histogram of the difference between the two means doing it 10000, the
65260.61 is definitely by chance because almost all of the differences are under 20000.
d)
with(realestate,plot(Price,Living.Area,pch=19))

with(realestate,cor(Price,Living.Area))
## [1] 0.7124

Yes, I think that the correlation coefficient is an appropriate measure of the strength of the
relationship because by looking at the scatterplot, you can see that there is a stronger
correlation at the lower prices and there are a couple of outliers in the higher prices. To
more correctly calculate the correlation coefficient I would eliminate the outliers first and
do the calculations again.
2. Traffic Headaches
No it is not appropriate to summarize the strength of association with a correlation
because there are a lot of outliers in the upper right quadrant which would affect the
correlation coefficient. The correlation between these variables may not be linear.
3. Performance IQ Scores vs. Brain Size
a)

iq=read.delim("http://sites.williams.edu/rdeveaux/files/2014/09/Performance_I
Q_scores_vs_brain_size.txt")
with(iq,plot(Performace.IQ,Brain.Size.pixels.,pch=19))

b) There is not much association between Performance IQ and Brain Size. Looking at the
scatterplot, the points are all over the place and even when divided into quadrants, one
cannot really see a certain trend or association.
c)
with(iq,cor(Performace.IQ,Brain.Size.pixels.))
## [1] 0.3868

The correlation coefficient is 0.3868173 which is very close to 0 and which means that
there is a random, nonlinear correlation.
4. Kentucky Derby
a)

kentuckyderby=read.delim("http://sites.williams.edu/rdeveaux/files/2014/09/Ke
ntucky_Derby_2014.txt")
with(kentuckyderby,plot(Speed..mph.,Year.,pch=19))

b) There is a pretty strong positive correlation between the year. As the year increase, so
does the speed. This correlation can be described as linear however that would not be the
most accurate representation because there is a slight curve.
c) No, performance have not improved at the same rate through out history because as you
can see in the scatterplot above, in the later years, the curve seems to be steeper which
means that the performance is improving faster.
d) I don't think that a correlation coefficient here is appropriate because the graph does not
have a linear relationship rather a quadratic one.

S-ar putea să vă placă și