Documente Academic
Documente Profesional
Documente Cultură
Part One
In the first Lab, you recreated some of the displays and preliminary analysis of Arbuthnot’s
baptism data. Your assignment involves repeating these steps, but for present day birth
records in the United States. Load up the present day data with the following command.
source("http://www.openintro.org/stat/data/present.R")
dim(present)
## [1] 63 3
names(present)
2. How do these counts compare to Arbuthnot’s (in our Lab)? Are they on a similar scale?
answer: They have the same number of variables, but have fewer data
3. Does Arbuthnot’s observation about boys being born in greater proportion than girls
hold up in the U.S.?
answer: Yes, it holds up to be true in every year
4. Make a plot that displays the boy-to-girl ratio for every year in the data set. What do
you see?
answer:
plot(present$year, present$boys/present$girls)
5. In what year did we see the most total number of births in the U.S.? You can refer to
the help files or the R reference card (http://cran.r-project.org/doc/contrib/Short-
refcard.pdf ) to find helpful commands.
hint: You can try the commands “which.max”
answer:
which.max(present$boys + present$girls)
## [1] 22
Part Two
In the second part, we will use the same data set for the exercise.
source("http://www.openintro.org/stat/data/cdc.R")
You want to double check what variables are available in the data set cdc.
names(cdc)
## [1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight"
## [7] "wtdesire" "age" "gender"
1. Make a scatterplot of weight versus desired weight. Describe the relationship between
these two variables.
answer:
plot(cdc$wtdesire, cdc$weight)
2. Let’s consider a new variable: the difference between desired weight (wtdesire) and
current weight (weight). Create this new variable by subtracting the two columns in
the data frame and assigning them to a new object called wdiff.
answer:
cdc$wdiff <- cdc$weight-cdc$wtdesire
3. What type of data is wdiff? If an observation wdiff is 0, what does this mean about the
person’s weight and desired weight. What if wdiff is positive or negative?
answer:
typeof(cdc)
## [1] "list"
print('If wdiff=0, it means the person have achieved his/her ideal weight')
## [1] "If wdiff=0, it means the person have achieved his/her ideal weight"
4. Describe the distribution of wdiff in terms of its center, shape, and spread, including
any plots you use. What does this tell us about how people feel about their current
weight?
answer:
hist(cdc$wdiff)
5. Using numerical summaries and a side-by-side box plot, determine if men tend to view
their weight differently than women.
hint: You can see an example at http://homepages.gac.edu/~anienow2/MCS_142/R/R-
boxplot2.html.
answer:
summary(cdc$wdiff)
6. Now it’s time to get creative. Find the mean and standard deviation of weight and
determine what proportion of the weights are within one standard deviation of the
mean.
answer:
mean(cdc$weight)
## [1] 169.683
sd(cdc$weight)
## [1] 40.08097