Sunteți pe pagina 1din 6

APSC254 – Lab (II) Report

Aiman Ridwan Mohd Hafash, (70965850) aiman.ridwan@alumni.ubc.ca

Part One
In the first Lab, you recreated some of the displays and preliminary analysis of Arbuthnot’s
baptism data. Your assignment involves repeating these steps, but for present day birth
records in the United States. Load up the present day data with the following command.
source("http://www.openintro.org/stat/data/present.R")

The data are stored in a data frame called present.


1. What years are included in this data set? What are the dimensions of the data frame
and what are the variable or column names?
answer:
head(present$year)

## [1] 1940 1941 1942 1943 1944 1945

print('The years are from 1940 until 2002')

## [1] "The years are from 1940 until 2002"

dim(present)

## [1] 63 3

names(present)

## [1] "year" "boys" "girls"

2. How do these counts compare to Arbuthnot’s (in our Lab)? Are they on a similar scale?
answer: They have the same number of variables, but have fewer data
3. Does Arbuthnot’s observation about boys being born in greater proportion than girls
hold up in the U.S.?
answer: Yes, it holds up to be true in every year
4. Make a plot that displays the boy-to-girl ratio for every year in the data set. What do
you see?
answer:
plot(present$year, present$boys/present$girls)
5. In what year did we see the most total number of births in the U.S.? You can refer to
the help files or the R reference card (http://cran.r-project.org/doc/contrib/Short-
refcard.pdf ) to find helpful commands.
hint: You can try the commands “which.max”
answer:
which.max(present$boys + present$girls)

## [1] 22

print('This refers to Year 1961')

## [1] "This refers to Year 1961"

Part Two
In the second part, we will use the same data set for the exercise.
source("http://www.openintro.org/stat/data/cdc.R")

You want to double check what variables are available in the data set cdc.
names(cdc)
## [1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight"
## [7] "wtdesire" "age" "gender"

1. Make a scatterplot of weight versus desired weight. Describe the relationship between
these two variables.
answer:
plot(cdc$wtdesire, cdc$weight)

print('Both have strong positive correlation')

## [1] "Both have strong positive correlation"

2. Let’s consider a new variable: the difference between desired weight (wtdesire) and
current weight (weight). Create this new variable by subtracting the two columns in
the data frame and assigning them to a new object called wdiff.
answer:
cdc$wdiff <- cdc$weight-cdc$wtdesire

3. What type of data is wdiff? If an observation wdiff is 0, what does this mean about the
person’s weight and desired weight. What if wdiff is positive or negative?
answer:
typeof(cdc)
## [1] "list"

print('Wdiff is Categorical ordinal data')

## [1] "Wdiff is Categorical ordinal data"

print('If wdiff=0, it means the person have achieved his/her ideal weight')

## [1] "If wdiff=0, it means the person have achieved his/her ideal weight"

print('If wdiff<0, they are underweight from desired weight')

## [1] "If wdiff<0, they are underweight from desired weight"

print('If wdiff>0, they are ovewrweight from desired weight')

## [1] "If wdiff>0, they are ovewrweight from desired weight"

4. Describe the distribution of wdiff in terms of its center, shape, and spread, including
any plots you use. What does this tell us about how people feel about their current
weight?
answer:
hist(cdc$wdiff)

print('Unimodal and right skewed. The center is in between 0 and 50 which


means that most of the people feel that they are overweight ')
## [1] "Unimodal and right skewed. The center is in between 0 and 50 which
means that most of the people feel that they are overweight "

5. Using numerical summaries and a side-by-side box plot, determine if men tend to view
their weight differently than women.
hint: You can see an example at http://homepages.gac.edu/~anienow2/MCS_142/R/R-
boxplot2.html.
answer:
summary(cdc$wdiff)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -500.00 0.00 10.00 14.59 21.00 300.00

boxplot(cdc$wdiff ~ cdc$gender, ylab="Weight Difference", xlab="Genders")

6. Now it’s time to get creative. Find the mean and standard deviation of weight and
determine what proportion of the weights are within one standard deviation of the
mean.
answer:
mean(cdc$weight)

## [1] 169.683

sd(cdc$weight)
## [1] 40.08097

print('The proportion weight is from 129.6 to 209.8')

## [1] "The proportion weight is from 129.6 to 209.8"

7. Extra questions to think about. No need to write on the report.


• What concepts from the textbook are covered in this lab?
• What concepts, if any, are not covered in the textbook?
• Have you seen these concepts elsewhere, e.g. lecture, discussion section, previous labs,
or homework problems?

S-ar putea să vă placă și