Sunteți pe pagina 1din 10

Statistics 1040 Term Project

Part 1: Choosing Data


1. Tell me which of the three data sets you have chosen to work with this semester.
Data Set 4: Freshman 15 Data:
Data Set 4 explores the well-known expression Freshman 15. This term has been used to
describe the weight gain that takes place during a college students freshman year. Both
weight (in kilograms) and BMI were collected from multiple male and female students. The
weight and BMI calculations were measured in September, the beginning of the freshman
academic school year, and again in April, the end of the freshman academic school year.
Results originally published July 1, 2006 in The Journal of American College Health, volume
55, number 1, page 41; article name Changes in Body Weight and Fat Mass of Men and
Women in the First Year of College: A Study of the Freshman 15. Article authors include
D.J. Hoffman, P. Policastro, V. Quick, and S.K. Lee.
3. Complete the following table for all variables in your chosen data set.
Variable name in the date set

Describe what the variable means
(include units)
Is the variable qualitative
or quantitative

What is the level of measurement for
this variable
Sex

Gender (female or male)

Qualitative

Nominal

Weight in September
Weight in kilograms of both male and
female participants in September

Quantitative
Ratio

Weight in April
Weight in kilograms of both male and
female participants in April

Quantitative

Ratio
BMI in September
BMI of both male and female
participants in September
Quantitative Interval
BMI in April

BMI of both male and female
participants in April
Quantitative


Interval






Part 2: Graphical Representation of Data

1. Sample 1 Pie Chart


Sample 1 Pareto












2.










2. The second sample was obtained by a systematic approach. The first row was selected
(row 1) and every other row thereafter (counted by twos). This only got us a sample of
n=34. Since the sample is obtained without replacement, we started again back at the
remaining data and used the same approach, picking the first remaining row 2 and this got
us to the sample n=35.
3. Though not much different, the first simple random of size n=35 contained 42.86%
responses from women, compared to 57.14% of men. The second systematic sample
contained 51.34 % of woman responders, compared to 48.57% of men and more equally
represents both genders
4. The graphs obtained from the whole data set are very close to the data obtained from the
systematic sample two. The whole data set contains 35 women and 32 men, compared with
the systematic sample two of 18 women and 17 men. The systematic sample almost equally
accounts for both genders. Adversely, sample one represents more men than women
whereas the whole data set contains more women than men. I will say that I noticed how
off pie charts seem to represent data. For some reason, small differences in data, as in the
pie chart from sample 1, appear to be much larger. I can see how pie charts may be
misleading with no numerical values. I believe histograms are a much better way of
graphically representing data.

Part 3: Sample Statistics

1. Sample statistics for the two group-selected samples:
Simple Random sample n=35:
Mean: 65.97
Standard deviation: 10.81 kg
Five-number summary:
Minimum 50
Q1 57
Median (Q2) 65
Q3 71
Max - 94

Sample n=35 using systematic approach (every 5
th
row)
Mean: 64.34
Standard deviation: 17.03 kg
Five-number summary:
Minimum 50
Q1 56
Median (Q2) 64
Q3 70
Max - 94




















2. Frequency Histogram and Box Plot for each sample

Graphs for Simple Random sample n=35












Graphs for Systematic Sample n=35:











4. I wish there was some way to contrast each of the graphs between the two samples.
However, they are both unsymmetrical and skewed to the right. This is not surprising
because the summary statistics of each sample are very similar. It is reassuring to know that
two different approaches to obtaining a sample yielded very similar results.


Part 4: Confidence intervals
1. Confidence interval for population proportion: The sample proportion = .571 is the
best point estimate of the population proportion p. The 95% confidence interval of the
population proportion for women was .407 < p < .735. In this manner, we are 95%
confident that the interval from .407 to .735 actually does contain the true value of the
population proportion p.
Confidence interval for the population mean: The sample mean = 65.1 is the best point
estimate of the population mean . The 95% confidence interval for the population mean
for September Weight in Kilograms is 62.3 < <69.7. Due to this, we are 95% confident
that the interval 62.3 to 69.7 actually does contain the true value of .
Confidence interval for population standard deviation: The sample standard deviation of
= 11.3 is the best point estimate of the population standard deviation . The 95%
confidence interval of the population standard deviation for September Weight in
Kilograms is 9.194 < < 15.296. Based on these calculations, we are 95% confident that the
limits of 9.194 to 15.296 contain the true value of .
2. For each of the confidence intervals, the values of the population parameters were
captured. For the population parameter, there were 35 females and 32 males. With that in
mind, the 95% confidence interval for the population proportion is .407 to .725. This
confidence interval captures the population parameter because proportionally speaking,
this would mean the population should contain between 27 to 49 men and women, which it
does. In terms of the population mean, the population mean of 65.1 is captured within the
95% confidence interval of 62.3 to 69.7. The same also goes for the standard deviation. The
95% confidence interval of the population standard deviation is 9.195 to 15.296 and this
limit does contain the standard deviation of 11.3 of the population parameter. It is very cool
that our intervals captured the population parameters and that we actually got to compute
it and see it correspond to a project that we are personally working on.
Part 5: Hypotheses Testing
1. Level of significance chosen: 5%

2. In an observational study following the weights of females and males during a freshman
year, the population proportion consisted of 51.45% females. Use a 0.05 significance level to
test the claim that a simple random selection of 35 individuals, the proportion of females
would be 51.45%.

H0: p=52.24%
H1: p 52.24%

Test statistic: = .289

= .5145
p = .49
q = 1-.49 = .51
n=35

z=

()()

.290

P-value: (.6141
^2
) = .377

Fail to reject the null hypothesis because the p-value is greater than the significance level.

There is not sufficient evidence to warrant rejection of the claim that the proportion of
females will be 49%.

3. In an observational study following the weights of females and males during a freshman
year, the September weight (kg) of the freshman is summarized by n=35, and
s = 10.81. Use a 0.05 significance level to test the claim that the September weight during
freshman year had a mean of 61 kg.

H0:
H1:

Test Statistic: 2.72


=65.97
n= 35
s= 10.81




t =

= 2.72
P-value = .01

Reject H0 because the P-value is less than the significant level 0.05.

There is sufficient evidence to support the claim that the sample is from a population with a
mean weight of 61 kg during the September month of freshman year.

4. Given this information, the hypothesis test for the population mean for weight in kg does
support the actual population (values of the original whole data set) mean of 65.1. Im sure I
didnt form a correct hypothesis test, however. In regard to the hypothesis test regarding
the population proportion, the population proportion (values of the original whole data set)
has a proportion of 52.24 females. Also, when doing another simple random in StatCrunch,
the proportion of females in a sample of 35 was 57.14% The hypothesis test for the
population proportion claimed there was not sufficient evidence to warrant the rejection of
the claim, and so I guess the hypothesis test regarding the population proportion is
supported because it cant actually be rejected because the
P-value was greater than the significance level. Okay, am I right here? Partially right?

Part 6: Reflection
Many people hear the word statistics and automatically assume they will be learning a
different form of mathematics. I was the same way. I believed, like the many mathematics
courses in my time, that there was nothing beyond a letter grade or beyond the actual
course itself. To be honest, I would say there are dozens of algebraic equations that I will
never use in my lifetime. Im not saying that these equations are not important and dont
have real life applications; however, its hard to see these applications beyond the scope of a
textbook, and there is truth in the saying, when will I ever use this.
Statistics is completely different than what I had expected. It truly is a subject that goes
beyond the final letter grade and follows us into real life. There isnt a single person who
wouldnt benefit from the knowledge acquired in a statistics course. I would say the striking
difference between other mathematics courses I have taken and this statistics course is
critical thinking. I couldnt get by on solely knowing an equation. It required me to put
thought into what I was doing and it begged me to ask important questions about the
information put before me and whether or not I was just going to take it at face value or
instead from the position of an informed consumer.
This term project taught me to do something I have never before been able to do. I was able
to take a set of data and scrutinize it with my own statistical analysis. I was able to use rules
and equations to determine whether or not the data was something I could trust. I learned
about the various ways to graphically portray data and how some ways are much more
advantageous that others. I learned how to sum up a set of data purely with summary
statistics, how to construct confidence intervals and how to test claims about the
populations proportion and mean. Less importantly but to a mind-blowing degree, I
learned that whomever created the algorithms used to construct the glorious technology of
the TI-84 should be given their weight in gold.
One of the ways Im most interested in applying what Ive learned this semester is towards
the scrutiny of statistical information given for various medical topics. Healthcare is a
billion dollar industry with various special interest groups forcing the hands of politicians
and researchers in order to construct policy and research studies in their favor. Medicine is
my passion. I want to know everything I can about everything there is and I simply cant get
enough. But I also need to know that I can trust the information being presented and not
assume its coming from a reputable, independent source. I now have the tools to look at
statistical data and determine its worth and whether or not its something I will find useful
and valuable in terms of my own professional standard of care.

S-ar putea să vă placă și