Lab Report For APSC 254

APSC254 – Lab (II) Report
Aiman Ridwan Mohd Hafash, (70965850) aiman.ridwan@alumni.ubc.ca
Part One
In the first Lab, you recreated some of the displays and preliminary analysis of Arbuthnot’s
baptism data. Your assignment involves repeating these steps, but for present day birth
records in the United States. Load up the present day data with the following command.
source("http://www.openintro.org/stat/data/present.R")
The data are stored in a data frame called present.

1. What years are included in this data set? What are the dimensions of the data frame
and what are the variable or column names?
answer:
head(present$year)
## [1] 1940 1941 1942 1943 1944 1945
print('The years are from 1940 until 2002')
## [1] "The years are from 1940 until 2002"
dim(present)
## [1] 63 3
names(present)
## [1] "year" "boys" "girls"
2. How do these counts compare to Arbuthnot’s (in our Lab)? Are they on a similar scale?
answer: They have the same number of variables, but have fewer data
3. Does Arbuthnot’s observation about boys being born in greater proportion than girls
hold up in the U.S.?
answer: Yes, it holds up to be true in every year
4. Make a plot that displays the boy-to-girl ratio for every year in the data set. What do
you see?
answer:
plot(present$year, present$boys/present$girls)
5. In what year did we see the most total number of births in the U.S.? You can refer to
the help files or the R reference card (http://cran.r-project.org/doc/contrib/Short-
refcard.pdf ) to find helpful commands.
hint: You can try the commands “which.max”
answer:
which.max(present$boys + present$girls)
## [1] 22
print('This refers to Year 1961')
## [1] "This refers to Year 1961"
Part Two
In the second part, we will use the same data set for the exercise.
source("http://www.openintro.org/stat/data/cdc.R")
You want to double check what variables are available in the data set cdc.
names(cdc)
## [1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight"
## [7] "wtdesire" "age" "gender"
1. Make a scatterplot of weight versus desired weight. Describe the relationship between
these two variables.
answer:
plot(cdc$wtdesire, cdc$weight)
print('Both have strong positive correlation')
## [1] "Both have strong positive correlation"
2. Let’s consider a new variable: the difference between desired weight (wtdesire) and
current weight (weight). Create this new variable by subtracting the two columns in
the data frame and assigning them to a new object called wdiff.
answer:
cdc$wdiff <- cdc$weight-cdc$wtdesire
3. What type of data is wdiff? If an observation wdiff is 0, what does this mean about the
person’s weight and desired weight. What if wdiff is positive or negative?
answer:
typeof(cdc)
## [1] "list"
print('Wdiff is Categorical ordinal data')
## [1] "Wdiff is Categorical ordinal data"
print('If wdiff=0, it means the person have achieved his/her ideal weight')
## [1] "If wdiff=0, it means the person have achieved his/her ideal weight"
print('If wdiff<0, they are underweight from desired weight')
## [1] "If wdiff<0, they are underweight from desired weight"
print('If wdiff>0, they are ovewrweight from desired weight')
## [1] "If wdiff>0, they are ovewrweight from desired weight"
4. Describe the distribution of wdiff in terms of its center, shape, and spread, including
any plots you use. What does this tell us about how people feel about their current
weight?
answer:
hist(cdc$wdiff)
print('Unimodal and right skewed. The center is in between 0 and 50 which

means that most of the people feel that they are overweight ')
## [1] "Unimodal and right skewed. The center is in between 0 and 50 which
means that most of the people feel that they are overweight "
5. Using numerical summaries and a side-by-side box plot, determine if men tend to view
their weight differently than women.
hint: You can see an example at http://homepages.gac.edu/~anienow2/MCS_142/R/R-
boxplot2.html.
answer:
summary(cdc$wdiff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -500.00 0.00 10.00 14.59 21.00 300.00
boxplot(cdc$wdiff ~ cdc$gender, ylab="Weight Difference", xlab="Genders")
6. Now it’s time to get creative. Find the mean and standard deviation of weight and
determine what proportion of the weights are within one standard deviation of the
mean.
answer:
mean(cdc$weight)
## [1] 169.683
sd(cdc$weight)
## [1] 40.08097
print('The proportion weight is from 129.6 to 209.8')
## [1] "The proportion weight is from 129.6 to 209.8"
7. Extra questions to think about. No need to write on the report.

• What concepts from the textbook are covered in this lab?
• What concepts, if any, are not covered in the textbook?
• Have you seen these concepts elsewhere, e.g. lecture, discussion section, previous labs,
or homework problems?

Lab Report For APSC 254

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lab Report For APSC 254

Încărcat de

Drepturi de autor:

Formate disponibile

APSC254 – Lab (II) Report

Aiman Ridwan Mohd Hafash, (70965850) aiman.ridwan@alumni.ubc.ca

The data are stored in a data frame called present.

## [1] 1940 1941 1942 1943 1944 1945

print('The years are from 1940 until 2002')

## [1] "The years are from 1940 until 2002"

## [1] "year" "boys" "girls"

print('This refers to Year 1961')

## [1] "This refers to Year 1961"

print('Both have strong positive correlation')

## [1] "Both have strong positive correlation"

print('Wdiff is Categorical ordinal data')

## [1] "Wdiff is Categorical ordinal data"

print('If wdiff<0, they are underweight from desired weight')

## [1] "If wdiff<0, they are underweight from desired weight"

print('If wdiff>0, they are ovewrweight from desired weight')

## [1] "If wdiff>0, they are ovewrweight from desired weight"

print('Unimodal and right skewed. The center is in between 0 and 50 which

## Min. 1st Qu. Median Mean 3rd Qu. Max.

boxplot(cdc$wdiff ~ cdc$gender, ylab="Weight Difference", xlab="Genders")

print('The proportion weight is from 129.6 to 209.8')

## [1] "The proportion weight is from 129.6 to 209.8"

7. Extra questions to think about. No need to write on the report.

S-ar putea să vă placă și